Font Size: a A A

On Some Key Problems In Semi-Supervised Ranking

Posted on:2015-09-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z B PanFull Text:PDF
GTID:1220330428965936Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Ranking is the central problem in information retrieval and plays an increasingly im-portant role in many real applications such as search engine, collaborative filtering, drug discovery and bioinformatics and so on. Ranking aims to return an ordinal list according to the given training sample. Due to the great success of the support vector machine algorithm in machine learning community, learning has become the most important method to solve the ranking problem. Learning to rank is now a kind of new learning problem and a hot research topic in the machine learning community parallel to classification and regression.However, at present many researches on learning to rank were focused mainly on the supervised setting. But in real applications, it is more convenient to obtain unlabeled instances than labeled ones. In general, we are often in a situation that a large amount of unlabeled data and a few labeled instances coexist, namely, the problem of semi-supervised learning to rank. Thus this thesis focuses mainly on the feature extraction, model design and analysis in semi-supervised ranking. The main contributions of the thesis are as follows.1. Contraposing the problem that the current semi-supervised ranking models do not take into consideration the information provided by the values of the instance labels or the magnitude of differences of instance labels, the thesis proposes two semi-supervised ranking models:(a) Graph-based Transductive Ranking. We build this model on the basis of the similarity matrix of the graph, deduce the explicit form of its solution which can be utilized to get the rating score for the unlabeled samples involved in the training stage.(b) Magnitude-Preserving Semi-Supervised Graph-based Ranking. This model adopts the least square ranking loss. We then prove the representer theorem, obtain the explicit form of its solution, give the upper bound of the generalization error and demonstrate that its generalization performance is closely related to the property of the similarity matrix of the graph. The experimental results on some recommendation task datasets and quantitative structure-activity relationship analysis datasets show that our proposed algorithm usually outperforms many other stat-of-the-art algorithms for learning to rank.2. Contraposing the problem that the kernel functions are lacking in semi-supervised learning to rank and few of them have orthogonal components of the nonlinear feature, the thesis builds two kinds of different Legendre kernel functions:(a) Orthogonal Legendre Kernel based on the orthogonal Legendre polynomials.(b) Generalized Legendre Kernel based on the generalized Legendre polynomials. The components of the nonlinear mapping determined by Legendre kernels are orthogonal to each other, thus the data redundancy is reduced. The experimental results in some classification and regression tasks indicate that compared with the existing Chebyshev kernel functions, the support vector machine algorithm with our proposed generalized Legendre kernel usually has less support vectors, higher stability and better generalization performance.3. Contraposing the small sample size problem faced by linear discriminant analysis in many real applications, this thesis designs two novel linear discriminant analysis criteria:(a) Weighted Sum Discriminant Analysis. It includes two different measures, both the within-class scatter and the between-class similarity. Its optimization model is reduced to an eigen decomposition problem, thus it can overcome the small sample size problem and extract more features than the traditional criterion,(b) Range Space Linear Discriminant Analysis. As a two-stage criterion, it projects all samples onto the range space of the between-class scatter matrix in the first stage and then performances the traditional linear discriminant analysis. Compared to some popular discriminant analysis criterion, it has competitive accuracy and can reduce the computation time efficiently.
Keywords/Search Tags:Learning to Rank, Semi-Supervised Ranking, Magnitude-Preserving Semi-Supervised Graph-based Ranking, Generalized Legendre Kernel, WeightedSum Discriminant Analysis
PDF Full Text Request
Related items