Font Size: a A A

Research On Semi-supervised Ranking Algorithms

Posted on:2015-03-02Degree:MasterType:Thesis
Country:ChinaCandidate:Z G MiaoFull Text:PDF
GTID:2268330428999841Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Learning to rank is a hot research topic in the field of information retrieval and machine learning, and has found its applications in many problems such as document retrieval, collaborative filtering, natural language parsing. The goal of learning to rank is to automatically learn a ranking model from the training data using machine learning techniques. Progress has been made in developing different algorithms for the ranking problem. Depending on the input representation and loss function, these algorithms can be divided into three categories:pointwise approach, pairwise approach and listwise approach.Learning to rank is an instance of supervised learning, and therefore a labeled train-ing set is needed for training. However, in practical applications, it is a time consuming and expensive task to obtain labeled data. In order to exploiting the large amount of unlabeled data, it is natural to consider the problem of semi-supervised ranking. Us-ing semi-supervised learning techniques to exploit the implicit information from a large amount of unlabeled data will help to reduce the labeling costs, and improve the per-formance of the ranking algorithm. Hence, our paper aims to develop semi-supervised ranking algorithms for the task of learning to rank. The main achievements of our work are as follows.First, we proposed the general framework of regularized boosting for semi-supervise ranking. Based on the regularized boosting framework, we developed a semi-supervised ranking algorithm based on RankBoost. Regularization is a widely used semi-supervised learning technique which forces the learner to exploit unlabeled data by introducing extra regularization penalty to the usual loss function. Boosting is a simple and ef-ficient ensemble learning method with theoretical justifications, it obtains a better-performanced model by linearly combines a weak model iteratively. Combining these two important technologies, we adapt the supervised ranking algorithm RankBoost to the semi-supervised setting. Specifically, we introduce the regularization penalty term which embodies the smooth assumption form semi-supervised learning to ensure sim-ilar examples have similar rank scores to augment the traditional loss. Furthermore, we derive iterative training procedure to optimize the loss function based on boosting method. The algorithm designed has both make reasonable use of the semi-supervised assumption and retains the advantages of simple and efficient of boosting method.Second, we proposed a general framework to extend listwise ranking algorithms to the semi-supervised setting. Under this framework, the algorithm will first label some unlabeled examples according to some rules, then the traditional listwise algorithm is performed on the augmented dataset. Sepcifically, we extended one of the state-of-the-art listwise ranking algorithm AdaRank to the semi-supervised setting. The algorithm makes use of the label propagation algorithm to label unlabeled ones. Benefit from the advantages of listwise approach, the designed algorithm will highly improve the performance of semi-supervised ranking algorithms.At last, the comparison experimental results on publicly available dataset Letor with the existing semi-supervised ranking algorithm show the feasibility of proposed framework and the effectiveness of corresponding algorithms.
Keywords/Search Tags:Learning to rank, Semi-supervised learning, Regularization, Boosting, RankBoost, AdaRank
PDF Full Text Request
Related items