Font Size: a A A

Crawl Technology Research In Vertical Search Engine

Posted on:2009-12-17Degree:MasterType:Thesis
Country:ChinaCandidate:C LiuFull Text:PDF
GTID:2178360242982996Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The concept of Vertical Search Engine is directed towards a specific domain to provide some valuable information and some interrelated service. It is the subdivision and the extension of Search Engine. It is a brand new way of providing information service in accordance with the operation of professional users. This paper is concerning about the crawl technology of search Engine, mainly concerning about the crawl problem in Vertical Search Engine: Hidden Web, time-effectiveness, performance and efficiency.We first introduce the architecture of our Vertical Search Crawl System and propose a crawl system framework which is distributed and based on extensible plug-ins. The distributed property and the plug-in are all convenient for extensible for the future. Then discuss 3 questions in Hidden Web Crawl, bring a self-learning way of Elimination of Duplicated Chinese address for the crawl result of hidden web; Then develop a query triggered crawling for the time-effectiveness problem. Discuss and compare the crawl mode, crawl strategy, crawl frequency which could affect the Vertical Search crawl system and in our system we adopt the steady mode, in-place strategy, combine of real time crawl and fixed frequency.According to the experiment, our method for eliminating duplicate result and the time-effectiveness could get better effectiveness and better user experience.
Keywords/Search Tags:Vertical Search, Extensible, Hidden Web, Time-effectiveness
PDF Full Text Request
Related items