Web化学化工资源的挖掘及化学信息学

Xiaoxia Li, et al., Searching Internet Chemical Information ?C From Surface Web to Deep Web, The 13th Asian Chemical Congress, Sep 13-15,2009, Shanghai (分组邀请报告)

引用格式: Xiaoxia Li, et al., Searching Internet Chemical Information – From Surface Web to Deep Web, 13th Asian Chemical Congress, Sep 13-15,2009, Shanghai
标题:Searching Internet Chemical Information – From Surface Web to Deep Web
作者: LI Xiaoxia, YUAN Xiaolong, XIA Zhaojie, NIE Fengguang, GUO Li;中国科学院过程工程研究所多相复杂系统国家重点实验室:高性能计算与化学信息学课题组
关键词: Chemistry Search Engine, Chemistry Deep Web, Data extraction,搜索引擎,网络爬行,深层网检索,数据提取
摘要:The Internet becomes the largest collection and sometimes the only source of chemical information today since the born of World Wide Web around 1995. While enjoying the ever possible convenience in getting information, challenge still exists in developing proper tools for finding scholarly chemical information on Internet because of the dynamic and distributed nature and huge space of the Web. The Web can be classified into surface Web and Deep Web from the point of accessing or index mechanism of general-purpose search engines, where surface Web refers to the portion of the World Wide Web that is indexed by conventional search engines based on hyperlink analysis. The part of the Web that consists of databases and is not reachable this way is called the Deep Web. Although being heavily used daily, what the general-purpose search engines can search are mainly the chemistry surface Web and usually have better recall but lower precision that more refining in searching strategy are needed for users to overcome such limitation. Because of the vast variety of database structures and possible search terms, solutions to search Deep Web are still under development for search engines like Google and other IT explorers, where the chemistry Deep Web has not been covered in these efforts. Besides the general-purpose search engines, chemistry focused tools are another kind of daily tools for searching the chemistry web. This presentation will overview our efforts in developing chemistry oriented tools for searching both the chemistry surface Web and Deep Web, including a chemistry Web directory, ChIN (http://chin.csdl.ac.cn/) that has been running for 10 years with more than 260,000,000 requests, a prototype chemistry search engine ChemEngine and ChemDB Portal (http://www.chemdb-portal.cn/), a prototype of chemistry Deep Web search engine.