Research In Chinese Information Retrieval Based On Cross Terms

Posted on:2017-03-15

Degree:Master

Type:Thesis

Country:China

Candidate:G L Zhou

Full Text:PDF

GTID:2348330488982879

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

In recent 10 years,along with the prosperity and development of the World Wide Web,information and data exponential growth explosively, then how to obtain the required information from so much information effectively and efficiently has become the everlasting important theme of information retrieval method research, which is to solve the user's soaring information demand better.In the most traditional information retrieval models,it is often assumed that the query terms are independent of each other.Although these retrieval models were also applied to Chinese information retrieval and achieved nice performance, the limitations of term independence assumptions are also existing, and word association in probabilisitic retrieval framework is of great promising on theory.In order to improve the performance of retrieval, the association between query terms must be considered.Therefore, the paper makes a research on the Chinese information retrieval method based on the cross terms model.The following two aspects are the main work of the paper:First, in order to improve the performance of the retrieval model,the cross term model is introduced into the Chinese information retrieval to model the word proximity.The association among multiple query terms can be expressed by simple unigram pseudo terms in the same way through cross terms. The bigram cross terms will occur when the corresponding query terms are close to each other,and we use kernel functions to model the effect. We use two ways, which are based on lexicon and binary, to segment Chinese text information, and use space to separate the words in the text. In modeling, location information in the document is identified by word. So that not only consider the proximity of words, but also the complex relationship in a certain extent, and it can make help to improve the accuracy of retrieval. We test in the Chinese test documents, NTCIR-5 and NTCIR-6, and the result indicates that, compared with the the traditional retrieval method, Chinese information retrieval modeling method based on the cross term model has better performance.Secondly, considered the position feature of the document in the pseudo relevance feedback method to get more appropriate extended terms of original query. The position information of terms are added into the dependency model, which given full consideration to the position of query terms in the feedback documents. Using cross terms model to gather the position relationship among the query terms and giving higher weights to the words close to query terms,and then to get terms that more close to query theme to be extended terms. In this paper, we consider two ways,independent and identically distributed method and conditional sampling method, to estimate pseudo relevance feedback. We verify the retrieval performance of the two feedback methods based on the lexicon indexing through NTCIR-5 documents. And the experiment results show that, compared with the the traditional feedback methods, Chinese pseudo relevance feedback method based on the cross term model has better performance.

Keywords/Search Tags:

Chinese Information Retrieval, Cross terms, proximity, BM25, pseudo relevance feedback

PDF Full Text Request

Related items

1	Cross Language Information Retrieval Based On Topical Pseudo Relevance Feedback
2	Research In Chinese Information Retrieval Based On Positional Language Models
3	Research On Pre-trained BERT Based Pseudo-relevance Feedback Method
4	A Study Of Collection-based Features For Adapting The Balance Parameter In Pseudo Relevance Feedback
5	Research On Pseudo Relevance Feedback Based On Document Similarity
6	Research On Pseudo Relevance Feedback Query Expansion Technology Based On Latent Semantic Relation
7	Studies On Affinity Propagation Based Pseudo-Relevance Feedback And Document Expansion For Spoken Document Retrieval
8	Research Of Mongolian Retrieval Technology Based On The New Incremental Query Expension
9	Research On Retrieval Method Based On Positional Relationship In Document
10	Research On Entity-based Information Retrieval Models