A regression framework for learning to rank in web information retrieval

Posted on:2009-12-11

Degree:Ph.D

Type:Dissertation

University:State University of New York at Buffalo

Candidate:Zheng, Zhaohui

Full Text:PDF

GTID:1448390002492183

Subject:Computer Science

Abstract/Summary:

Machine learning approaches for learning ranking functions have been generating much interest from both the web information retrieval community and the machine learning community recently. It has the promise of improved relevancy of search engines and reduced demand for manual parameter tuning. We focus on developing a regression framework for learning to rank with complex loss functions. More specifically, this framework first applies functional iterative or boosting algorithm to compute updates for a given loss function and then fit the updates with a standard regression base learner. We explore supervised learning methodology from machine learning, and we distinguish two types of relevance judgments used as the training data: (1) absolute relevance judgments arising from explicit labeling of query-document pairs; and (2) relative relevance judgments extracted from user click-throughs of search results or converted from the absolute relevance judgments. Within the framework, we propose three novel ranking algorithms and illustrate their application to web search ranking. The first one is to calibrate the existing point-wise(univariant) regression loss to incorporate query difference in terms of introducing nuisance parameters in the statistical models, and we present an alternating optimization method to simultaneously learn the retrieval function and the nuisance parameters. It is an improvement over the existing approach within the category of learning to rank using point-wise regression loss. The second is an extension of gradient boosting methods for point-wise regression loss to complex(multi-variant) loss functions. It is based on optimization of quadratic upper bounds of the loss functions which allows us to present a rigorous convergence analysis of the algorithm. We illustrate an application of this approach in pair-wise preference learning to rank for Web search by combining both preference data and labeled data. The third one is a list-wise approach based on minimum effort optimization that takes into account the entire training data within a query at each iteration. We tackle this optimization problem using functional iterative methods where the update at each iteration is computed by solving an isotonic regression problem. This more global approach results in faster convergency and signficantly improved performance of the learned ranking functions over existing state-of-the-art methods.;Experimental results are carried out using both data sets obtained from a commercial search engine and widely used IR benchmarking data, namely OHSUMED and TREC. Our results show significant improvements of our proposed methods over existing state-of-the-art methods.

Keywords/Search Tags:

Rank, Web, Regression, Framework, Data, Functions, Methods, Relevance judgments

Related items

1	Reduced-rank regression of functional data
2	Research On Regression Analysis Based On Low-Rank Representation Algorithm For Noise Data
3	Classication and Regression Framework for Characterizing Contaminant Source Zone
4	Research On The Methods Of Cross Project Software Defect Prediction Based On Learning To Rank Approach
5	Research On Key Technologies Of Relevance Vector Regression Metamodeling And Their Application
6	Methods Design Of Search Engine Web Page Relevance Assessment And Application In Rank Model
7	Robust Low Rank Matrix Recovery And Application Of Sparse Logistic Regression Model
8	A Relevance-theoretic Study Of The Constraining Functions Of Discourse Markers In TV Interview
9	Principal Component Analysis Methods Based On Sparse And Low-rank Constraints
10	Low-rank Representation Based Discrimination And Regression Based Classification