Font Size: a A A

Research On Learning To Rank For Information Retrieval

Posted on:2018-10-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:K ChenFull Text:PDF
GTID:1368330623450469Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Learning to rank is an algorithm that uses machine learning to predict and rank the relevance of information retrieval results,which has been widely used in the applications of search engine,multimedia search and recommendation system.However,with the development of network technology,the data of information retrieval has become more diversified,including texts,pictures and other unstructured data.Different types of data often require different sort learning algorithms.Therefore,how to effectively use the ranking learning algorithm to retrieve all kinds of data is a challenging issue.It is also a hot topic in the field of information retrieval and machine learning.The main work and innovation of this paper are summarized as follows:(1)A kernel approximation based ranking support vector machine algorithm is proposed.The learning methods for nonlinear RankSVM are still time-consuming because of the calculation of kernel matrix.We propose a fast ranking algorithm based on kernel approximation to avoid computing the kernel matrix.We explore two types of kernel approximation methods,namely,the Nystr?m method and random Fourier features.Primal truncated Newton method is used to optimize the pairwise L2-loss(squared Hinge-loss)objective function of the ranking model after the nonlinear kernel approximation.Experimental results demonstrate that the proposed algorithm can significantly reduce training time of kernel RankSVM algorithm while ensuring the same ranking performance.(2)A flexible ranking extreme learning machine based on query-level normalized ranking loss is proposed.Query-level normalized loss function is used to avoid training a bias model.Then the matrix-centering transformation is used to reformulate the loss function,which greatly simplifies the learning process because of the symmetry and idempotence of the centering matrix.Three different kinds of ranking ELM algorithms are implemented based on the matrix-centering transformation:(a)regularized ranking ELM with better generalization performance;(b)enhanced incremental ranking ELM which can incrementally add hidden nodes and can obtain a more compact network architecture;(c)online sequential ranking ELM which can update the trained model using new training data.Experimental results demonstrate that our proposed ranking ELM algorithms can obtain comparable or better performances over state-of-the-art ranking algorithms.(3)A data cleaning and ranking learning algorithm based on deep features is proposed.In order to overcome the problem of over-dirty large-scale face datasets,we use the features of deep networks to construct face similarity graph,and then use community detection algorithm to remove the noise data contained in discrete communities.This method can preserve the diversity of datasets while cleaning data.At the same time,multi-scale feature is used to improve the representation ability of the network,and the network is optimized directly using the triplet loss function based on the reference points to get a more distinguishable representation.The experimental results show that the data set obtained by community-based face data cleaning method can keep the data diversity to the maximum while guaranteeing high cleanliness.The face recognition model achieves a precision of 99.67% on the LFW dataset,which achieves the highest face recognition accuracy using a single deep network.(4)A ranking algorithm based on multi-output extremely randomized trees is proposed.For software verification,proof of theory and microprocessor verification,it is difficult to choose the appropriate constraint solver.We use the information retrieval technology based on rank learning to select the satisfiability solver,which can select the most suitable solver for different constraint problems.Three kinds of multi-output learning methods are used to predict the performances of the candidate algorithms:(a)multi-output regressor stacking;(b)multi-output extremely randomized trees;(c)hybrid single-output and multi-output trees.The experimental results on 11 SAT datasets and 5MaxSAT datasets indicate that our proposed methods can obtain better performance over state-of-the-art solver selection methods.
Keywords/Search Tags:Information Retrieval, Learning to Rank, Support Vector Machine, Extreme Learning Machine, Deep Learning, Data Cleaning, Face Recognition, Solver Selection
PDF Full Text Request
Related items