Font Size: a A A

Research On Key Technology Of Vertical Search Engine

Posted on:2017-05-27Degree:MasterType:Thesis
Country:ChinaCandidate:Y F ZhangFull Text:PDF
GTID:2308330503479767Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Search engines appear to meet the needs of people search, well-known Baidu, Google and other search engines have deeply penetrated into people’s daily lives. However, Internet data beyond the scope of the search engine can cover the world has reached hundreds of millions of Internet resources, it is difficult to get the information they need specific subject areas from general search engines. Emergence of vertical search engines, to solve this problem, to achieve a particular user to retrieve accurate information on a specific topic.This paper introduces the research background, as well as the basic principle of vertical search engine search engine development status at home and abroad, describes the theory of vertical search engine, gives the basic concepts of vertical search engine, which differs from general search engines evaluation criteria and vertical search engine performance. And describes the various component modules and their functions vertical search engine core technology vertical search engine involved in-depth research, the main crawler technology, page structure of extraction technology, Chinese word segmentation and Lucene indexing technology, these vertical search engine technology used in information collection module, an information extraction module, indexing module and user interface module. Wherein the information collection module uses crawler technology climb to fetch data from the Internet; information extraction module is downloaded pages structured extraction operation; indexing module to extract structured information indexing database; user configuration module to the user It provides a query interface for users to provide query and returns the results to the user.The main research work and innovation of this article are: extended and improved by the powerful open source crawler frame Heritrix, make it possible to achieve directional sports information capture, introducing APHash algorithm, improved queue allocation strategy, which greatly improved the Heritrix crawler efficiency; with sports brands such as the professional vocabulary expanded JE segmentation tool used by the theme of the thesaurus, making specialty thesaurus, has greatly improved the accuracy of the query; in core vertical search engine technology and research on the basis of the functional modules successfully build a sports-oriented information vertical search system prototype realization of sports simple queries.
Keywords/Search Tags:Vertical search engine, Heritrix crawler, Chinese word segmentation, Lucene index
PDF Full Text Request
Related items