Font Size: a A A

Research Of Ontology-based Topic-specific Search Engine Technique

Posted on:2012-12-16Degree:MasterType:Thesis
Country:ChinaCandidate:G C LuFull Text:PDF
GTID:2178330335451065Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The Internet is an important source of information for human in modern society. How to efficiently organize vast amounts of information on the Internet for the user to retrieve, and how to feedback it to the user accurately is becoming a research hotspot. Search Engine is undoubtedly the best solution, but with the ever-growing amount of data on the Internet, traditional search engines are increasingly unable to meet the user's specialized and diverse needs. So the topic-specific search engine emerged. As the name suggests, the topic-specific search engine is used for a specify field, users use it to retrieve a particular subject.The focused crawler is the most important part of the topic-specific search engine, and the web page classification is a key component of the focused crawler, it can determine whether a web page is relevant to the subject. The traditional document classification only literally determine the relevance of the web page, but the Ontology based web pages classification in this paper can be used for crawling web pages semantically.Firstly, this paper made a brief introduction on related technologies, including Ontology, Chinese Word Segmentation and Focused Crawling technology.Then we propose the ontology based similarity calculation model and the web pages structure based feature vector extraction model, By using these two models, the focused crawler achieves a larger increase in efficiency and accuracy than the using of traditional methods.Then we use the open source project Lucene for crawling web pages and index it, and create a user search interface. Then we implemented a tourism search engine.Finally, we sum up the main points of this paper, and prospect for the future work.The main contents of this paper are:1. The concepts of the search engine and technology related to the topic-specific search engine.2. Document Classification concepts and related technologies.3. Ontology and domain ontology-based similarity model.4. Lucene based full text retrieval.
Keywords/Search Tags:Ontology, Topic-specific search engines, Document classification, Lucene, Focused Crawler
PDF Full Text Request
Related items