Font Size: a A A

Research And Implementation Of Intelligent Crawler System In Vertical Search Engine

Posted on:2018-05-21Degree:MasterType:Thesis
Country:ChinaCandidate:S WangFull Text:PDF
GTID:2348330518496550Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of information technology, search engines have become the entrance to the Internet. Vertical search engine in a vertical field has wide range of attention and market demand because of the depth of the data collection and processing as well as accurate and professional search services.Intelligent crawler is an important part of the vertical search engine to complete the work of data collection. Because of its strong professional background, narrow coverage, different intelligent crawlers have great diversity in the structure and strategy. They are also facing a lot of problems such as more accurate relevance of the subject, a large number of small-scale acquisition. In this paper, aiming at related issues of the intelligent crawler in vertical search engine, the relevant technical researches and the solutions of overall structure will be made. Using the plug-in and distributed design principles, we complete the Intelligent crawler system, as well as the the system performance testing and verification.The main work of this paper is as follows: (1) A text feature extraction method based on LDA is designed, and a solution of decision on relevance of topic based on machine learning algorithm is proposed, and a link forecasting model based on anchor text feature and web page topic relevancy is established.(2) Design a multi-strategy program dealing with anti-crawling and proxy server filtering process. (3) A two-layer and three-instance URL de-duplication scheme based on Bloom filter is designed, which is characterized by high availability and persistence.It solves the fast and accurate de-duplication of massive URLs. (4)Design other functions of intelligent crawler and Coding to achieve a complete system.By constructing the experimental topology environment and deploying the system, the function and performance of the intelligent crawler system are verified and tested. The result shows that the design and implementation improve the intelligence and efficiency of the crawler.
Keywords/Search Tags:vertical search, intelegent crawler, relevance of topic, system design
PDF Full Text Request
Related items