Font Size: a A A

The Design And Implementation Of Topical Search Engine

Posted on:2008-04-18Degree:MasterType:Thesis
Country:ChinaCandidate:P WangFull Text:PDF
GTID:2178360272470092Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of the network technology and the rapid growth of Internet information resource, traditional Search Engines can't satisfy the requirement of personalized information retrieval service. Topical Search Engine only searches the Web resources related with specific theme, so it will do better in special information search. To design a Topical Search Engine, the core task is to design a topical crawler. And improve the method of page filtration and crawling strategy are two most important tasks of topical crawler.After analyzing the relevant judge methods, this paper presents a new method in relativity judgment by integrated use of vector space model, link tags analysis and metadata analysis. Result shows that it may improve the speed of crawler.Based on the research of current crawling strategy, think about the problem of the performance of crawler, the content-based strategy is decided to use in this paper. But there are some shortages such as"short sight"in the crawling strategy. In order to improve the crawling strategy, an easy structure analysis method base on URL content is added to control the crawler. We combine content-based determinant with the simple link analysis and URL tag information as the crawling strategy. This strategy can resolve the"short sight"shortage and improve the performance of crawler.Based on the survey of topical Search Engine and traditional Search Engine, this article researches the structure of topical Search Engine, and then gives the structure design and implementation of topical Search Engine. The experiment results show that the page filtration and the crawling strategy can effectively control the crawling process, make crawler only fetching the information related to the theme. The Topical Search Engine System can reach system design requirements.
Keywords/Search Tags:Topical Search Engine, Web Crawler, Crawling strategy, Page Filtration
PDF Full Text Request
Related items