Font Size: a A A

Domain Information Retrieval Based On Term Relationships Of Thesaurus

Posted on:2012-02-24Degree:MasterType:Thesis
Country:ChinaCandidate:X XiongFull Text:PDF
GTID:2178330335979518Subject:Information Science
Abstract/Summary:PDF Full Text Request
Thesaurus is a kind of controlled vocabularies, and is consisted by the scientific terminologies which are preferred from natural language according to semantic relations between terms. It is an important term control tool which was used to transform users'natural language to controlled language, and it has made a great contribution to the traditional information retrieval. After the 1990s, along with the dramatically development of Internet technology, the information environment is changing radically. The construction method, presentation format and usage mode of old-timey thesaurus are no longer fit for the network information. However, the disadvantage of network information retrieval systems, such as search engine, is revealing gradually. In this case, the application of thesaurus is considered as the new research focus of information retrieval.Through the deep research and analysis of network information retrieval systems and traditional thesauri, this paper learned the application status of existing search engine and documentary database, the construction and application of thesauri, and the principle of query expansion. On this basis, this paper designed a method system of domain information retrieval based on the term relationships of thesaurus. It combined the controlled language and natural language, used query expansion and term weighting retrieval technology, and studied out a relevance ranking algorithm.In order to verify the feasibility and effectiveness of this methodology, this paper implemented a prototype system based on the use of C# language and SQL Server. Then it selected two appropriate categories which have neither too many nor too few terms from"Agricultural Thesaurus", and chose the search results of Baidu Search Engine and Wanfang Data as the experimental material. There are two main experimental stages, measuring the optimal weight vectors of expanding terms and evaluating the effect of relevance ranking.The experimental results shows that, in all of the term relationships in thesaurus, synonyms and hyponyms can improve the precision of information retrieval greatly, while the hypernyms and related terms can hardly have a positive impact. Furthermore, it is true that there are still many problems in the realization and popularization of thesaurus-based information retrieval systems. And the continuous improvement and innovation is needed to promote thesaurus more adapt to the network information environment.
Keywords/Search Tags:Thesaurus, Term relationships, Information retrieval, Query expansion, Weighted retrieval
PDF Full Text Request
Related items