Font Size: a A A

Research On Information Retrieval Model Based On Deep Web

Posted on:2009-08-27Degree:MasterType:Thesis
Country:ChinaCandidate:B H WuFull Text:PDF
GTID:2178360245954996Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development and diffusion of Internet, people have different understanding and requirements about information access compared with the past. They need more accurate and rapid access to substantial information on the Web. More and more traditional resources are being transferred to the Internet, which lead to the rapid expansion of the number of online resources. However, the traditional resources retrieval methods cannot meet people's requirements. As its powerful and easy-to-user features, Web Search Engine is the most frequently used tool for information organization and retrieval. However, conventional Web search engines can not find all the information on the Internet for the existence of certain resources known as Deep Web. Therefore, taking the hidden resources behind the Deep Web as a starting point, it is significant to study how to fully utilize the information on the Web.From the scene of information resources on the Internet, the paper performed a systematic and deep analysis on the distribution and structure of Deep Web. To solve the problem of low information coverage of conventional search engine, the paper designed and implemented a deep Web crawler which can discover and download more pages from Internet. Also, the paper proposed an information retrieval framework based on this crawler. Studies can be concluded as follows:(1) Defects of a Web crawler can lead to low information coverage of conventional search engines through analyzing shortcomings of them.(2) Study characteristics and features of resources hidden in the deep Web.(3) Propose an information retrieval framework based on deep Web and define its purposes and features.(4) Design and build a Web crawler for deep Web according to the page collecting mechanism of deep Web.(5) Improvements are made to the crawler in order to collect more pages with fewer resources.(6) Propose a better algorithm for Chinese words segmenting and build a prototype system providing an existing full-text indexing library. Experimental results prove that the system is effective.
Keywords/Search Tags:information retrieval, Deep Web, retrieval model, search engine
PDF Full Text Request
Related items