Theoretical Analysis On Direct Optimization Of Information Retrieval Measures In Learning To Rank

Posted on:2011-12-16

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Y He

Full Text:PDF

GTID:1488303311492504

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

As World Wide Web grows rapidly, it has become the biggest source of informa-tion for people. As a result, it also becomes more and more difficult for people to get what they want from the huge volume of information in World Wide Web. Noticing this, search engines have been proposed to provide people a convenient way to retrieve the wanted information from World Wide Web. The key factor of the search engines is a ranking model that returns a ranked document list to the people based on a sub-mitted query. How to design a good ranking model is a very important and popular topic in the area of information retrieval (IR). Recently, learning to rank, which applies machine learning techniques to IR, has attracted more and more attentions because of its successfulness.Direct optimization of IR evaluation measures has become an important branch of learning to rank for IR. Since IR evaluation measures are difficult to optimize due to their non-continuity and non-differentiability, most direct optimization methods op-timize some surrogate functions instead, which we call surrogate measures. A critical issue regarding these methods is whether the optimization of the surrogate measures can really lead to the optimization of the original IR evaluation measures. As far as we know, there no related work that have tried to answer this critical issue. Therefore, the theoretical properties of the direct optimization methods are still unclear to the people.In this thesis, we perform formal analysis on this issue. We first propose two concepts named�directness�and�tendency correlation�to describe the relationship between a surrogate measure and its corresponding IR evaluation measure. Based on these two concepts, we then analyze the surrogate measures optimized in a number of popular direct optimization methods. Our theoretical findings can explain the experi-mental results observed on public benchmark datasets.?We first propose two concepts named�directness�and�tendency correlation�to describe the relationship between a surrogate measure and its corresponding IR evaluation measure. We show that when a surrogate measure has arbitrarily large directness or arbitrarily strong tendency correlation to an IR evaluation measure, the optimization of it will lead to the effective optimization of the original IR evaluation measure. ?We then analyze both the directness and the tendency correlation of the surrogate measures optimized in a number of direct optimization methods. We prove that the surrogate measures in SoftRankNDCG, ApproxRankMMAP, and ApproxRankNDCG can have arbitrarily large directness and arbitrarily strong tendency correlation with the original IR evaluation measures, regardless of the data distribution, when some parameters are appropriately set. However, the surrogate measures in SVMMAP, DORMNDCG, PermuRankMAP, and SVMNDCG cannot have arbitrar-ily large directness and arbitrarily strong tendency correlation with the original IR evaluation measures on certain distributions of data. Therefore SoftRankNDCG, ApproxRankMMAP, and ApproxRankNDCG are theoretically sounder than SVMMAP, DORMNDCG, PermuRankMAP, and SVMNDCG, and are expected to result in better ranking performances.?We perform experiments on public benchmark datasets and validate our theo-retical analysis on the popular direct optimization methods. This indicates the correctness to use�directness�and�tendency correlation�to describe the rela-tionship between a surrogate measure and its corresponding IR evaluation mea-sure.

Keywords/Search Tags:

Information Retrieval, Learning to Rank, Direct Optimization, Directness, Tendency Correlation

PDF Full Text Request

Related items

1	Research On Learning To Rank Models Of Directly Optimizing Evaluation Measures
2	Research On Learning To Rank For Information Retrieval
3	Researches On Information Retrieval Model Based On The Algorithm Of Learning To Rank
4	Research Of Learning To Rank In Information Retrieval
5	Research On Information Retrieval Model Based On Learning To Rank
6	Application Research Of Network Public Opinion Monitoring Based On Text Tendency Correlation Analysis
7	Research Of Learning To Rank In Information Retrieval
8	Research Of Learning To Rank Based On Directly Optimizing Evaluation Measures In Information Retrieval
9	Learning To Rank For Biomedical Information Retrieval
10	Rank Optimization For Person Re-identification Through Intelligent Machine Learning Techniques