Font Size: a A A

Research Of Learning To Rank In Information Retrieval

Posted on:2013-05-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y LinFull Text:PDF
GTID:1228330395999264Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
An essential issue in information retrieval is ranking, which is used to rank documents by their expected relevance to a given query. Traditional approaches resort to empirical methods in ranking model construction. They include two kinds of methods, content-based methods which depend on the relevance between query and document; and link-based methods which based on importance of the Web pages on the internet. They are effective to meet the requirements of users. A retrieval strategy chooses one method from these methods as the ranking basis. However single ranking approach can not improve the ranking performance effectively, and more researches focus on merging of ranking methods, especially learning to rank is one of most effective approach on this field. Learning to rank is based on machine learning, which selects retrieval methods as features to learn a ranking model based on relevance judgment. It can improve the ranking accuracy effectively. The research is based on exisiting learning to rank approaches to improve the prediction performance on relevance. The main work of this dissertation can be summarized as follows:(1) The research work aims to investigate the potential to improve the existing listwise approaches. A group based framework is proposed to deal with the issue of listwise framework. There are two types of samples in the listwise framework:One-group sample, which is constituted by a document with higher level label and a group of documents with lower level label; Group-group sample, which is constituted by a group of documents with higher level label and a group of documents with lower level label. We define two loss functions for each sample. The experimental results show that group based ranking method can improve the performance of likelihood and cross entropy loss functions, which can also improve the ranking accuracy.(2) The feature space is very important to the performance of learning to rank appraoch. The features for learning to rank are based on two aspects. Firstly, we explore how to construct the features from existing learning to rank features, we use semi-supervised learning method to construct new features from unlabeled data set by SVD. Secondly, we explore how to extract features from retrieval methods. Existing retrieval methods tend to select a single parameter as optimal parameter. However it may be not always effective, so we take language model for information retrieval as basis, and use multiple parameters, multiple context fields and multiple smoothing methods to extract ranking features to improve the ranking performance of language model. Finally we also use these features to expand feature space of learning to rank to improve the performance of exisiting ranking approaches. The experiments on Letor data set reveal that the new features are effective to improve the ranking performance, which can also improve the performance of ranking model.(3) Learning to rank can be not only applied to re-ranking retrieval results, but also used to other fields of information retrieval. We introduce the learning to rank method to query expansion for extracting the expansion words from social annotation resource to improve the performance of query expansion. We construct the term ranking model based on learning to rank approach, which is used to select the expansion terms to improve the performance of query expansion. The experimental results show that the term ranking model is effective to improve the ranking accuracy on TREC data sets.
Keywords/Search Tags:Information Retrieval, Learning to Rank, Loss Function, Feature Extraction, Query Expansion
PDF Full Text Request
Related items