Font Size: a A A

A Study Of Collection-based Features For Adapting The Balance Parameter In Pseudo Relevance Feedback

Posted on:2017-11-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y MengFull Text:PDF
GTID:2348330515467330Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology,the number of documents on the Internet increases exponentially.One of important researches focuses on how to deal with these great capacities of online documents.Text information retrieval is a task that involves finding more relevant documents for a user query in a collection of documents.Various retrieval models have been put forward to deal with the large text information,and the most effective one is pseudo-relevance feedback(PRF)model.PRF is an effective technique to improve the ad-hoc retrieval performance.For PRF methods,how to optimize the balance parameter between the original query model and feedback model is an important but difficult problem.Traditionally,the balance parameter is often manually tested and set to a fixed value across collections and queries.However,due to the difference among collections and individual queries,this parameter should be tuned differently.Recent research has studied various query based and feedback documents based features to predict the optimal balance parameter for each query on a specific collection,through a learning approach based on logistic regression.In this paper,we hypothesize that characteristics of collections are also important for the prediction.We propose and systematically investigate a series of collection-based features for queries,feedback documents and candidate expansion terms..The proposed features are feed into a logistic regression model to predict the feedback parameter.Firstly,this paper interpretations of the current development of domestic and international information retrieval,illustrates the significance of the paper and the main research.Then it introduces the overview of the development of information retrieval and PRF related technologies.After detailed description of the three types features,the process of the whole experiment and results are presented.The experiments show that our method is competitive in improving retrieval performance and particularly for cross-collection prediction,in comparison with the state-of-the-art approaches.
Keywords/Search Tags:Information Retrieval, Pseudo-Relevance, Collection Characteristic
PDF Full Text Request
Related items