Font Size: a A A

Research On Learning To Rank Algorithm For Information Retrieval

Posted on:2019-02-16Degree:MasterType:Thesis
Country:ChinaCandidate:D L WangFull Text:PDF
GTID:2348330545998789Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Learning to rank is a technology which combine machine learning and information retrieval.This method is a study between classification and regression,it is to solve the questions about ranking.The so called learning to rank is which use a set of tagged data and machine learning techniques automatically generates a ranking model that meets the user's requirements.Since learning to rank is widely used in document retrieval,recommendation system and other fields,learning to rank in recent years has attracted more and more researchers' attention.First,this thesis introduces the basic concepts of learning to rank,then introduces three groups of learning methods:pointwise method based on a single sample,pairwise method based on a pair sample of the document and the listwise based on the sample list of documents.In this thesis,the representative algorithms of three methods are introduced,and the traditional learning to rank algorithms are analyzed and the advantages and disadvantages are summarized.Then this thesis introduces some ranking algorithms based on direct optimization of rank measure,considering that such algorithms can get dependable ranking model,they often do not work well in some real situations,such as high dimension and large-scale.In this way,two more efficient learning to rank algorithms are proposed.Specifically,the thesis mainly includes the following aspects:(1)The most existing work on learning to rank assumes that the training data is clean,but due to the ambiguity of query intent,the lack of domain knowledge,and the vague definition of relevance levels,all make it difficult for common annotators to give reliable relevance labels to some documents.As a result,the relevance labels in the training data of learning to rank usually contain noise.If ranking model ignores this fact,the performance of learning to rank algorithms will be damaged.In view of this problem,an algorithm based on multi-objective learning to rank algorithm is proposed.The algorithm firstly defines three objectives:the gain function,the risk function and the number of samples,and the three targets are optimized by the NSGA-II evolutionary algorithm.Then,a Hybridization-encouraged Mechanism selection strategy was proposed.And using this strategy can further improve the performance of the algorithm.Experiments on the benchmark data set verify the effectiveness of the proposed algorithm.(2)Compared with the existing works,which mainly focus on the accuracy of the model,it is more challenging to design the ranking model with sparsity.To address the issue,in this paper,a mini-batch stochastic gradient method for sparse ranking was presented.The algorithm begins with formulating sparse learning to rank as a mini-batch based convex optimization problem with L1 regularization.Then for the problem that simple adding L1 term does not necessarily induce the sparsity,FTRL(Follow-The-Regularized-Leader)method is adopted for inner optimization,which can obtain good solution with high sparsity.A feature-wise learning rates updating strategy is also suggested,and with this strategy,the performance of the proposed algorithm can be further improved.Empirical studies on LETOR data sets confirm the significant advantage of the algorithm in comparison with several baseline methods.
Keywords/Search Tags:Learning to rank, Noise, NSGA-?, Sparse Model, FTRL, Mini-batch
PDF Full Text Request
Related items