Font Size: a A A

Research In Chinese Information Retrieval Based On Cross Terms

Posted on:2017-03-15Degree:MasterType:Thesis
Country:ChinaCandidate:G L ZhouFull Text:PDF
GTID:2348330488982879Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent 10 years,along with the prosperity and development of the World Wide Web,information and data exponential growth explosively, then how to obtain the required information from so much information effectively and efficiently has become the everlasting important theme of information retrieval method research, which is to solve the user's soaring information demand better.In the most traditional information retrieval models,it is often assumed that the query terms are independent of each other.Although these retrieval models were also applied to Chinese information retrieval and achieved nice performance, the limitations of term independence assumptions are also existing, and word association in probabilisitic retrieval framework is of great promising on theory.In order to improve the performance of retrieval, the association between query terms must be considered.Therefore, the paper makes a research on the Chinese information retrieval method based on the cross terms model.The following two aspects are the main work of the paper:First, in order to improve the performance of the retrieval model,the cross term model is introduced into the Chinese information retrieval to model the word proximity.The association among multiple query terms can be expressed by simple unigram pseudo terms in the same way through cross terms. The bigram cross terms will occur when the corresponding query terms are close to each other,and we use kernel functions to model the effect. We use two ways, which are based on lexicon and binary, to segment Chinese text information, and use space to separate the words in the text. In modeling, location information in the document is identified by word. So that not only consider the proximity of words, but also the complex relationship in a certain extent, and it can make help to improve the accuracy of retrieval. We test in the Chinese test documents, NTCIR-5 and NTCIR-6, and the result indicates that, compared with the the traditional retrieval method, Chinese information retrieval modeling method based on the cross term model has better performance.Secondly, considered the position feature of the document in the pseudo relevance feedback method to get more appropriate extended terms of original query. The position information of terms are added into the dependency model, which given full consideration to the position of query terms in the feedback documents. Using cross terms model to gather the position relationship among the query terms and giving higher weights to the words close to query terms,and then to get terms that more close to query theme to be extended terms. In this paper, we consider two ways,independent and identically distributed method and conditional sampling method, to estimate pseudo relevance feedback. We verify the retrieval performance of the two feedback methods based on the lexicon indexing through NTCIR-5 documents. And the experiment results show that, compared with the the traditional feedback methods, Chinese pseudo relevance feedback method based on the cross term model has better performance.
Keywords/Search Tags:Chinese Information Retrieval, Cross terms, proximity, BM25, pseudo relevance feedback
PDF Full Text Request
Related items