Font Size: a A A

Research On Domain-oriented Intelligent Deep Search Engine

Posted on:2012-04-09Degree:MasterType:Thesis
Country:ChinaCandidate:F WangFull Text:PDF
GTID:2268330392963268Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, Web has become a global repository ofinformation and web information is increasing exponentially. Mankind has entered the eraof information explosion. How to retrive the required information among vast amounts ofinformation quickly and accurately is a question which the information retrieval systemsneed to solve. Nowadays search engines (SE, search engine) can provid informationretrieval services, but their deficiencies exist. Although general search engines perform notbed in a wide range of information, they have less competent in certain specific areas ofinquiry because that their involved domains are too broad to go deep into and bespecialized in a certain area. Meanwhile, the Deep Web has brought great challenges totraditional crawler-based search engine technology. In addition, most of the existing searchengines are mainly keyword-based text search or browsing categories based on theme ofthe site. So their lack of semantic processing, often leads to false or missed search.Therefore, questions about how to improve search engine technology, to enhance thequality of Web information retrieval, to seek new and intelligent search methods, havebecome important research issues in the current information retrieval, data mining andother research areas. The main work of this thesis manifests in the following three aspects:First, analytical studies on development status of search engine, discussition theresearch significance and Architecture of topic-based search engines and deeply study ofthe core technologies on topic-based search, including the subject relevance judgments,Chinese word segment, page rank technology, etc. Taking popular science as an domainexample, this paper designed and implemented topic classifier with the SVM classificationalgorithm which has good proformance in current text classification area. And it is provedthat the classifier has rate of accuracy as high as94%through the experiment.Second, studies and discussions on what gives rise to the deep web, its characteristicsand research situation. This paper studied and designed domain-oriented deep search enginecombined with the subject search technique. A real-time deep web information integrationmodule has been designed and implemented through form-filled technique based on webpage structural analysis. It has a distinct role in terms of improving the search depth andreal-time proformance of topic-based search engine.Third, based on the above research and design, a lot of research and analysis havebeen done on Semantic Web, Domain Ontology and the related technologies. This paper presents a model of domain-oriented intelligent deep search engine, with the integration ofinformation retrieval technology and the characteristics of Semantic Web and DomainOntology. The model mainly divides into following several design key points: the subjectrelevance judgments, domain-oriented deep web information integration, and automaticestablishment of domain ontology, semantic inference and the implementation of conceptsimilarity algorithms.The distinguishing feature and innovation place of this thesis lie in:1) The firstpopular science domain search engine has been designed and implemented in this paperthrough apply to the popular scicene domain using the combined the technologies oftopic-based search and domain deep web integration.2) An automatic construction methodbased on Wikipedia resources has been proposed. Taking popular science domain as anexample, this paper established the first popular science ontology in our country throughthe method mentioned above.3) The domain ontology was applied into semanticinferenceand query expansion of search engine in using of semantic search technology.This paper presented a domain-oriented intelligent deep search engine model and made lotefforts to improve the recall rate, precision and semantic comprehension capability oftopic-based search engine.
Keywords/Search Tags:deep web, search engine topics, svm algorithm, popular science domainontology, semantic search
PDF Full Text Request
Related items