Font Size: a A A

The Research On Long Query Expansion On The Concept Of Semantic Similarity

Posted on:2014-11-30Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y YangFull Text:PDF
GTID:2348330485494957Subject:Information Science
Abstract/Summary:PDF Full Text Request
With the rapid development of internet, information retrieval in the network is also developing quickly. Currently, the main form of information retrieval is search engine, which has been the second network service following E-mail. The presenr search engines mostly use keywords for information retrieval, but the limited input of words can't completely express the mean of query. The ambiguity of words causes the search engine to return a large amount of unrelated documents, greatly reduces the recall and precison. On the other hand, users sometimes use long query for information retrieval. Due to the offsets of query subjects, the retrieval result is not ideal. Therefore, in order to solve the above problem, scholars proposed query expansion technique, which modify the original query words to improve the the query retrieval precision and recall. They indeed achieved some results, but mostly for shorter queries. However, in recent years, the foreign scholars pay more attention to the long query studies, that's because the natural language sentences can express complex and specific information needs better. It is a trend of future query expression of users. The rich semantic relationships of long query also provides better search basis for the semantic query expansion, it should be helpful for understanding the language feature and different syntax habits of users.Therefore, in order to solve the topics offset?low precison and related documents sort rearward in low recall of long query in search engine, this paper proposes the long query expansion on the concept of semantic similarity. First using AAlesk to find the correct meaning of query word, then add the semantic concept of query word in WordNet to the original long query. Second to cluster the concepts based on semantic similarity, and get the query clustering set, then calculate the clustering sets' overall level of semantic relevancy and concepts semantic importance, obtain the best candidate concepts. Finally, according to the score in the concept set to find the keyword, and use them to represent the original long query. In addition, this paper also apply KeyGraph keyword extraction method to process the long query, and put the two kinds of results into three different types of retrieval models for search experiment. The experiment results show the retrieval efficiency of improved long query is better, especially the method proposed in this paper can express the real information needs of users from the semantic level, greatly improves the precision and recall of long query, more suitable for application on existing mainstream language retrieval model.
Keywords/Search Tags:Long query, Semantic similarity, Retrieval model, WordNet
PDF Full Text Request
Related items