Font Size: a A A

Research On The Topical Search Engine Based On Semantic

Posted on:2012-07-29Degree:MasterType:Thesis
Country:ChinaCandidate:X B ZhengFull Text:PDF
GTID:2218330338470714Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the extensive development of Internet technology, the Internet has become the world's largest repository of information resource. In the 21st century, it is a major problem that how to find information fast, accurate and comprehensive on the Internet information space.General search engine has a huge index data and the broad themes, which can solve some information access problems. But the diverse crowd determines the needs of a wide range of information search then a general search engine can not meet the special areas, special populations and precise information needs. Users often need to find in the search results to get further interested information. This prompted a search engine towards professional, intelligent direction. So the high accuracy based on the theme of the search engines have been developed and applied.The emergence of theme search engine is to improve specific areas of Internet information retrieval efficiency. It through the Web crawler to obtain the theme information, indexing and provides relevant information and services to users. Theme search engines refine areas of common search engines. It meets the needs that industry users find the subject information quickly and accurately.In the theme of the search engine system development process, this thesis is to study the key theme of the search engine technology, including the text word segmentation, feature extraction, weight calculation, text classification, text similarity calculation and so on. It improves the traditional similarity measure, adding semantic relations between words, and applies in the subject search engine successfully.First of all, it introduces the topical search engines situation and research methods at home and abroad. It also notes the background and significance of the study.Secondly, it expounds the realization principle and key technologies of the search engines, and introduces the details of technical framework, the web crawlers, web content analysis, web indexing and retrieval, classification techniques and web ordering technology.After that, it studies deeply of the traditional text similarity algorithm and its shortcomings. Traditional text does not consider the semantic similarity algorithm, so words and words are independent. Therefore it can improve these shortcomings of the algorithm and expand the use of the generalized vector space model. The semantic similarity of two words could be calculated from the vocabulary knowledge. It is applied to the generalized vector space model and to get a new text similarity algorithm.Finally, through the Nutch open source framework for the second development, it can get functions such as the discovery of theme resource. Chinese segmentation, theme filter and so on. Then new text similarity algorithm is applied to filter subject to complete the theme of the search engine test system.
Keywords/Search Tags:theme search engine, feature extraction, text classification, text similarity calculation
PDF Full Text Request
Related items