Font Size: a A A

Reasearch And Implementation Of Topic Search Engine

Posted on:2011-07-06Degree:MasterType:Thesis
Country:ChinaCandidate:Y C LiFull Text:PDF
GTID:2178330332471030Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid growth of information and the popularity of the network, the internet has grown into the largest repository of information resources. How to find the information what they need in vast amout network information quickly and accurately has become an issue that people have to face. The invention of search engine brings much convenience to search useful information for users. However, current search engines are for all users, so make sure to return the results of comprehensive, which is not satisfied with the needs of engineers in special fields. Under the circumstances search engine on Topic-Specific appears.However, tradition search engines, including topic search engine, they only can match user query input mechanically, query result is not ideal. In this paper, we try to integrate ontology technology in the process of topic search and make use of ontology technology for the semantic query processing in order to enhance the efficiency of search engine queries.First, this paper talks about the basic principles of the search engine and topic search engine features, it points out the key difference that the different crawling strategies. Focus on analyzing lucene code for low couple structure, the index composition and internal data flow, the experiment shows lucene search is superior to the traditional search method.Then, paper introduces the concept, classification and description language of ontology. After studying ontology construction principles and comparing construction method, paper gives ontology construction method: determining the relationship between domain concepts and concepts of hierarchy, adding attributes, examples and constraints of information in order to improve the relationship between the concept. This text choose the Protege 3.4.1 to build ontology by Top-down approach.Finally, in the process of implementing system, information collecttion module completes the theme information collection by expanding the existing Heritrix crawler frame, Pre-processing module establishes inverted index for information by lucene and expands lucene word segmentation, query model combining ontology to expand information. Base on the target site to grab, extract the information of pages extraction and final query test, the feasibility of this design system is checked. Query results compared this experiment system with traditional search engine show the former has a better precision rate and the information query tips. It provides a new idea for the combination of search engine and ontology technology.
Keywords/Search Tags:search engine, subject search, ontology, web crawler
PDF Full Text Request
Related items