Learning to rank with partially-labeled data

Posted on:2010-03-10

Degree:Ph.D

Type:Thesis

University:University of Washington

Candidate:Duh, Kevin K

Full Text:PDF

GTID:2448390002486041

Subject:Engineering

Abstract/Summary:

Ranking is a key problem in many applications. In web search, for instance, webpages are ranked such that the most relevant ones are presented to the user first. In machine translation, a set of hypothesized translations are ranked so that the correct one is chosen. Abstractly, the problem of ranking is to predict an ordering over a set of objects. Given the importance of ranking in many applications, "Learning to Rank" has risen as an active research area, crossing disciplines such as machine learning and information retrieval. The approach is to adapt machine learning techniques developed for classification and regression problems to problems with rank structure. However, so far the majority of research has focused on the supervised learning setting. Supervised learning assumes that the ranking algorithm is provided with labeled data indicating the rankings or permutations of objects. Such labels may be expensive to obtain in practice.;The goal of this dissertation is to investigate the problem of ranking in the framework of semi-supervised learning. Semi-supervised learning assumes that data is only partially labeled, i.e. for some sets of objects, labels are not available. This kind of framework seeks to exploit the potentially vast amount of cheap unlabeled data in order to improve upon supervised learning. While both supervised learning for ranking and semi-supervised learning for classification have become active research themes, the combination, semi-supervised learning for ranking, has been less examined. This thesis aims to fill the gap.;The contribution of this thesis is an examination of several ways to exploit unlabeled data in ranking. In particular, four assumptions commonly used in classification (Change of Representation, Covariate Shift, Low Density Separation, Manifold) are extended to the ranking setting. Their implementations are tested on six real-world datasets from Information Retrieval, Machine Translation, and Computational Biology. The algorithmic contributions of this work include (a) a Local/Transductive meta-algorithm, which allows one to plug in different unlabel data assumptions with relative ease, and (b) a kernel defined on lists, which allow one to extend methods which work with samples (i.e. classification, regression) to methods which work with lists of samples (i.e. ranking). We demonstrate that several assumptions about how unlabeled data helps in classification can be successfully applied to the ranking problem, showing improvements over the supervised baseline under different dataset-method combinations.

Keywords/Search Tags:

Ranking, Data, Problem, Classification, Supervised

Related items

1	From Ordinal Ranking to Binary Classification
2	Nonparallel Hyperplanes Classifier For Semi-supervised Classification
3	A Business Hall Location Model Based On Semi-supervised Learning Ranking Network
4	A Classification Method For PU Problem Based On Data Distribution And Text Similarity
5	Protein sequence classification with Bayesian supervised and semi-supervised learned classifiers
6	Visual Saliency Objects Detection With Graph Based Semi-supervised Learning And Ranking Models
7	Learning with unlabeled data
8	Research On Web Search Ranking Based On Semi-supervised Learning
9	Semi-Supervised Adaptive Classification Over Data Stream
10	Bootstrap Dual Complementary Hashing With Semi-supervised Re-ranking On Large Scale Image Retrieval Problem