Font Size: a A A

Research And Implementation Of Vertical Search Engine Based On Distributed High-Precision Collector

Posted on:2012-08-31Degree:MasterType:Thesis
Country:ChinaCandidate:B ZhouFull Text:PDF
GTID:2178330335960860Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Along with the explosive growth of the web pages, the application value of search engine becomes higher and higher. There have been several of search engine systems, which integrate the internet information and supply the information guide and search service. However, the general search engine can't satisfy the precision information demand for some special field and special people. The diversity of the information demand decides the search engine service mode will appear subdivision and it should provide high-precision service mode according to different industries in the future, which promotes the vigorous development of vertical search engine.In this paper, we present a special vertical search engine for internet forums, blogs and news websites. It is a platform where users can browse the hot topics and search hot information. This vertical search engine will cooperate with general search engine and supply high-precision information in special field for users. In the search process, the main work and innovation is as follows:1. Present a collection method of high-precision method; 2. Present an architecture and relevant protocol of distributed crawler; 3. Present a crawling-period based distribution strategy; 4. Design the index and retrieval module based on Lucene and realize field-retrieval and batch-update of index.The whole search engine is composed of three modules:crawler module, index module and retrieval module. The design and implementation of the distributed vertical crawler is the key of the research and also the most important difference from the general search engine. Vertical crawler is proposed according to the concept of vertical search. Compared with general crawler, vertical crawler aimed to get the most valuable web resources by the least system resources, filter the useless information to the maximum extent and finally supply users with high-precision information.
Keywords/Search Tags:Search Engine, Vertical, High-precision information, Crawler, Index, Retrieval
PDF Full Text Request
Related items