Research In Chinese Information Retrieval Based On Positional Language Models

Posted on:2016-04-17

Degree:Master

Type:Thesis

Country:China

Candidate:Y L Chen

Full Text:PDF

GTID:2308330464472622

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of global information process, Chinese information resources on the internet is becoming increasingly rich. How to dig out the valuable information from these massive data has become a hot topic in the field of information retrieval, therefore the improvement of information retrieval technology is just around the corner.In most existing retrieval models, document are scored primarily based on various kinds of term statistics such as within-document frequencies, inverse document frequencies and document lengths. However, these models failed to fully consider the proximity of matched query terms in the document. While these models have been used to Chinese information retrieval and achieved good results, there is still room for improvement. So this paper mainly focus on Chinese information retrieval based on Positional Language Model. The main original works include:(1) We introduce Positional Language Model into Chinese information retrieval, use term positions and proximity in the document to collect the position information of query terms, in order to obtain the retrieve documents more relevant to query topic. We segment the Chinese text information with 2-gram-based and dictionary-based segmentation methods, and add the sign of space between words after segment. The position information is indentified with the term as a unit when modeling, so that we not only consider position proximity but also take into account complex relationships between terms. We have conducted experiments on Chinese document test sets NTCIR-5 and NTCIR6, the results show that the Chinese information retrieval based on positional language model has better retrieval performance than traditional retrieval methods.(2) We also add the document position feature to pseudo relevance feedback method to help the origin query to obtain more appropriate expanded terms. In this method, we incorporate the position information of terms in feedback document into relevance model, use positional language model to gather the positional relationships between words and query words in feedback document, and assign more weights to words closer to query words, so as to obtain the terms more related to the query topic as expanded terms. In this thesis, two methods to estimate pseudo relevance feedback are considered, independent identically distributed sampling and conditional sampling. We have verified the two methods on NTCIR-5 with dictionary-based index, the results show that the Chinese pseudo relevance feedback based on positional language model has better retrieval performance than traditional feedback methods.

Keywords/Search Tags:

term positions, proximity, Positional Language Model, pseudo relevance feedback

PDF Full Text Request

Related items

1	Cross Language Information Retrieval Based On Topical Pseudo Relevance Feedback
2	Semantic Positional Language Retrieval Models With A Proximity Information
3	Studies On Affinity Propagation Based Pseudo-Relevance Feedback And Document Expansion For Spoken Document Retrieval
4	Research On Retrieval Method Based On Positional Relationship In Document
5	Research On Relevance Feedback And Long-term Learning In Content Based 3D Model Retrieval
6	Research On Pre-trained BERT Based Pseudo-relevance Feedback Method
7	Research On Personalized Search Method Based On Language Model
8	Modeling Topic-based Semantics For Information Retrieval Models
9	Study On Open-Domain Question Answering
10	Research On Pseudo Relevance Feedback Based On Document Similarity