Font Size: a A A

The Key Technology Research On Deep Web Directory Search Engine

Posted on:2008-12-10Degree:MasterType:Thesis
Country:ChinaCandidate:L GaoFull Text:PDF
GTID:2178360218951486Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the World Wide Web, the Web has been rapidly deepened by myriad searchable databases online. A large amount of dynamic information from the databases behind query interfaces can not be retrieved because of the restrictions of current search engine technology. We call such information as Deep Web. Deep Web information retrieval is still a fresh field of study and has been paid more and more attention. In attempt to meet users' need for Deep Web information, this paper proposes a system architecture for a Deep Web directory search engine. According to this framework, we focus on the key issues in the Deep Web directory search engine, and propose relevant algorithms and models. The paper's main research works include:(1) We do some investigation on scale, distribution and structure of Chinese Deep Web resources.(2) To cope with limitation of traditional search engine crawler in Deep Web domain, we design a Deep Web focused crawler, and present a method to judge a Deep Web Query Interface.(3) We adopt an efficient algorithm to acquire contents of Web Databases. Through analysing the result pages,the irrelevant information is removed and a summary of the Web database contents is eventually constructed.(4) In accordance with Yahoo Directory, we propose a method which combines query interface pages and database summary to classify Deep Web resources.Finally, we design and implement a prototype for Deep Web directory search engine system called Deep Searcher, and we do experiments and analysises on the proposed algorithm.
Keywords/Search Tags:Deep Web, Search Engine, Focused Crawler, Web Database Content Summary, Data Source Classification
PDF Full Text Request
Related items