Font Size: a A A

Research On Semi-Supervised Learning-to-rank Algorithm Based On Low-Rank Graph

Posted on:2019-07-20Degree:MasterType:Thesis
Country:ChinaCandidate:D Q DuFull Text:PDF
GTID:2428330566473401Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Learning-to-rank is the core technology in the field of information retrieval,and it is also one of the hot topics in the field of machine learning.It is widely used in many fields such as document retrieval,collaborative filtering,computing advertising,intelligent question answering systems and so on.The essence of Learning-to-rank is to use machine learning theory and techniques to learn a ranking function from training data and automatically sort new sample objects.Currently,researchers have proposed a lot of Learning-to-rank algorithms,which can be divided into three categories according to the difference of both input form and loss function:pointwise approach,pairwise approach,and listwise approach.Learning-to-rank is a supervised machine learning technique,so training samples are all labeled.However,in practical applications,the collected sample data is often unlabeled and needs to be manually tagged.When faced with massive data,the workload is enormous and time-consuming and labor-intensive.Therefore,semi-supervised Learning-to-rank has become a hot topic in the field of Learning-to-rank.It only requires manual annotation of a small number of samples to learn the entire sample set and greatly reduces the labeling cost.Therefore,the paper aims to study the efficient Learning-to-rank algorithm in the semi-supervised situation,and studies the two stages of semi-supervised Learning-to-rank.One is to use semi-supervised learning to tag unlabeled samples,and the other aims to improve the listwise ranking model.The main contributions of the paper are as follows:Firstly,the paper proposes a general framework of semi-supervised Learning-to-rank based on low-rank graphs,and designs a semi-supervised RankSVM ranking algorithm based on this framework.For example,we apply Learning-to-rank technology to document retrieval.First,we formulate the problem of constructing the low-rank graph of the documents as solving a constrained low-rank matrix optimization problem,and use the Lagrange multiplier method to optimize it.Second,we use the label propagation algorithm?LPA?to iteratively propagate the tag of the labeled document to the unlabeled document on the low-rank graph.In order to overcome the weakness that it is time-consuming to perform LPA on large graph,the paper takes Graphx which is a graph-parallel computing framework on Spark distributed platform to perform LPA on low-rank graph,which improves the efficiency of labeling sample.Finally,all labeled samples are used for training the RankSVM ranking model.The experimental results show that the proposed semi-supervised ranking algorithm based on low-rank graph can effectively use the unlabeled sample information and improve the accuracy and efficiency of the ranking.Secondly,in order to solve the problem that most current listwise ranking models fail to emphasize the top ranking accuracy on the ranking list,the paper considers leveraging the cost-sensitive listwise ranking model,but the current cost-sensitive listwise ranking models ignore the effect of high-dimensional features including verbosity Remaining features and weak features on both accuracy and computational efficiency of the model.Therefore,the paper seeks to incorporate both cost-sensitive learning and sparse learning into the ListMLE model,thus proposes a new algorithm called SparseCSListMLE.Specifically,In order to achieve cost-sensitivity,different documents are imposed different weights on the probability function,so that greater losses will be produced when the documents on the top of ground truth ranking list are ranked behind.For the sake of achieving sparsity,the L1 regularization term is added to the ranking model,and an improved proximal gradient descent learning algorithm is used to obtain the model parameters.The comparison experiments on the public datasets show that SparseCSListMLE has improved both the ranking accuracy and the computational speed compared to the baseline algorithms.At last,the paper combines the results of the above two phases,a novel semi-supervised listwise ranking algorithm based on the low-rank graph is proposed.Firstly,the unlabeled samples are labeled by performing label propagation algorithm on the low-rank graph,and then all labeled samples are inputted into the ranking model SparseCSListMLE,and the experiments on the Letor dataset demonstrate the effectiveness of the algorithm.
Keywords/Search Tags:learning-to-rank, semi-supervised ranking, low-rank graph, label propagation algorithm, cost-sensitive learning, sparse representation, Sparse_CS_ListMLE
PDF Full Text Request
Related items