Web化学化工资源的挖掘及化学信息学

ASC 44, pp. 433–438, 2007

引用格式: Zhaojie Xia, Li Guo, Chunyang Liang, Xiaoxia Li, Zhangyuan Yang, Focused Crawling for Retrieving Chemical Information, Advances in Soft Computing, Innovations in Hybrid Intelligent Systems, ASC 44, pp. 433–438, 2007
标题:Focused Crawling for Retrieving Chemical Information
作者: Zhaojie Xia, Li Guo, Chunyang Liang, Xiaoxia Li, Zhangyuan Yang;中国科学院过程工程研究所多相复杂系统国家重点实验室:高性能计算与化学信息学课题组
关键词: 化学搜索引擎; 网络爬行; 化学主题爬虫; 机器学习
摘要:The exponential growth of resources available in the Web has made it important to develop instruments to perform search efficiently. This paper proposes an approach for chemical information discovery by using focused crawling. The comparison of combination using various feature representations and classifier algorithms to implement focused crawlers was carried out. Latent Semantic Indexing (LSI) and Mutual Information (MI) were used to extract features from documents, while Naive Bayes (NB) and Support Vector Machines (SVM) were the selected algorithms to compute content relevance score. It was found that the combination of LSI and SVM provided the best solution.