Font Size: a A A

The Search Engine Theme Research And Realization Of The Reptile

Posted on:2007-08-08Degree:MasterType:Thesis
Country:ChinaCandidate:W W LiuFull Text:PDF
GTID:2208360185991479Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid growth of the Internet, the conflict between the growth of the Web information and the ability of people achieving it is becoming huger and huger. Surrounding the research on this hotspot, the important part of the topic-specific search engine that is called focused crawler is discussed in this paper. The research on the searching algorithm is very important to the application and development of topic-specific search engine. Firstly, the basic theory of topic-specific search engine is simply introduced in this paper. The status of focused crawler in topic-specific search engine is brought forward. The work theory of the focused crawler is analyzed. Then several kinds of search algorithms of focused crawler are discussed. The search algorithm based on Web hyperlink and the one based on page content are detailed. In the course of the relativity judging between the page content and the topic, the method based on vector space model which is widely applied in the filed of the text classification is used. Lastly, a focused crawler called SoftSpider is brought forward and designed. This crawler uses the search algorithm both based on Web hyperlink structure and page content, and improves on Authorities and hubs algorithm. The details of structure and design are introduced. The performance of this crawler is tested and the result is presented.
Keywords/Search Tags:Search Engine, Focused crawler, Authorities and hubs, Web Hyperlink Analysis, VSM
PDF Full Text Request
Related items