Font Size: a A A

Research In Chinese Information Retrieval Based On Positional Language Models

Posted on:2016-04-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y L ChenFull Text:PDF
GTID:2308330464472622Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of global information process, Chinese information resources on the internet is becoming increasingly rich. How to dig out the valuable information from these massive data has become a hot topic in the field of information retrieval, therefore the improvement of information retrieval technology is just around the corner.In most existing retrieval models, document are scored primarily based on various kinds of term statistics such as within-document frequencies, inverse document frequencies and document lengths. However, these models failed to fully consider the proximity of matched query terms in the document. While these models have been used to Chinese information retrieval and achieved good results, there is still room for improvement. So this paper mainly focus on Chinese information retrieval based on Positional Language Model. The main original works include:(1) We introduce Positional Language Model into Chinese information retrieval, use term positions and proximity in the document to collect the position information of query terms, in order to obtain the retrieve documents more relevant to query topic. We segment the Chinese text information with 2-gram-based and dictionary-based segmentation methods, and add the sign of space between words after segment. The position information is indentified with the term as a unit when modeling, so that we not only consider position proximity but also take into account complex relationships between terms. We have conducted experiments on Chinese document test sets NTCIR-5 and NTCIR6, the results show that the Chinese information retrieval based on positional language model has better retrieval performance than traditional retrieval methods.(2) We also add the document position feature to pseudo relevance feedback method to help the origin query to obtain more appropriate expanded terms. In this method, we incorporate the position information of terms in feedback document into relevance model, use positional language model to gather the positional relationships between words and query words in feedback document, and assign more weights to words closer to query words, so as to obtain the terms more related to the query topic as expanded terms. In this thesis, two methods to estimate pseudo relevance feedback are considered, independent identically distributed sampling and conditional sampling. We have verified the two methods on NTCIR-5 with dictionary-based index, the results show that the Chinese pseudo relevance feedback based on positional language model has better retrieval performance than traditional feedback methods.
Keywords/Search Tags:term positions, proximity, Positional Language Model, pseudo relevance feedback
PDF Full Text Request
Related items