Font Size: a A A

The Research And Design On Vertical Search Engine

Posted on:2009-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:G L LiFull Text:PDF
GTID:2178360245498650Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Information on Web increases dramatically in recent years with the rapid development of Internet. The General Search Engines are faced with more and more challenges in the field of information gathering and information storing. Moreover General Search Engines mainly afford services for all users on Web. But some special users aren't satisfied about the searching result. They want more precious searching result other than general searching result. So the Vertical Search Engines emerge as the times require.The Vertical Search Engine is consisting of spider, indexer and searcher. It only gathers the interrelated information about searching topic other than General Search Engines. Spider of Vertical Search Engine calculates topic interrelated value about the current HTML page continually while it is crawling on Web, with the help of the topic interrelated value, Spider can estimate whether the current HTML page is interrelated with the searching topic .Furthermore, spider may avoid a great deal of junk information and find HTML pages in special field effectively. A lot of experiments show that Vertical Search Engines can get more efficient performance includes accuracy rate, recall rate and efficiency better than General Search Engines. Furthermore, Vertical Search Engines need little cost because of the decrease of HTML pages number. Anyone can build a Vertical Search Engine of high quality and efficiency with simple hardware.The paper firstly analyzes the system structure, work theory and key technologies of General Search Engines and Vertical Search Engines, then it introduces the research status and directions of Vertical Search Engines. It deeply analyzes the distributing character of topic HTML pages. It also studies the topic searching strategy and topic interrelated calculating. With the technology of UML, we get the orient object models of spider and indexer after designing the system structure. Finally, we realize the spider program and indexer program by Java and Lucene. The Vertical Search Engine runs on Tomcat stably.Innovations of this paper:(1) It effectively reuses the core code in Lucene by improving and expanding source code of Lucene.(2) It independently develops the Chinese Analyzer module on the base of Lucene.(3) It optimizes the searching strategy and designs a new searching strategy of Authorities and Hubs. Experiments show that the Vertical Search Engine can get better accuracy rate and avoid the topic excursion phenomena.
Keywords/Search Tags:vertical search engine, spider, Lucene, search strategy, topic interrelated, UML
PDF Full Text Request
Related items