Font Size: a A A

Research Of Learning To Rank In Information Retrieval

Posted on:2018-12-13Degree:MasterType:Thesis
Country:ChinaCandidate:W LeiFull Text:PDF
GTID:2428330512466966Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Since its naissance,information retrieval has always been the focus of research.With the rapid development of modern Internet technology,the data people produced grows at high speed.How to quickly find out the information that users need from the massive data most has become the point of the research in the field of information retrieval.In order to solve this problem,it is necessary to design a good ranking model,listing the information users needed most in the forefront.In the early research of information retrieval,in order to solve the ranking problem,the researchers focused on the analysis of the relevance between query and document,and thus gave birth to the ranking methods such as Boolean model and vector space model.With the development of the network,the problem of information retrieval is to find the most relevant web pages in the web search.Therefore,some retrieval models based on link analysis came into being,including PageRank,HITS and so on.These models have their own advantages and disadvantages,the retrieval system will be based on their own work environment in the demand,choose one of the sorting strategy.But this single sorting strategy is more and more difficult to meet the needs of people.In this regard,some scholars have carried out a comprehensive study of various sorting algorithms,in order to obtain a better performance can be a sort of strategy.The learning to rank(L2R)is developed in this context.L2 R is the use of machine learning methods to solve the ranking problem.It automatically generates a sort model by training the existing data.Due to the many problems,the generated model can meet the needs of users better than the traditional model.On the basis of previous studies,this paper mainly studies the feature extraction and the algorithm of sorting model:(1)The feature selection has a great influence on the result of the ranking model in the L2 R.At present,the characteristics of L2 R are generally the characteristics of the traditional retrieval model.There are few researches on the feature selection of L2 R.For these reasons,this paper studies the feature extraction from the two aspects.After recognizing the importance of smoothing the language model,a feature extraction method based on multi parameter language model is proposed.In addition,after deeply analyzing the principle of the CBOW model in NNLM,a method to extract the semantic features of the document is found.This method is based on the original CBOW model,and adds the document vector to the input.The experimental results on the LETOR4.0 dataset show that the two new features can improve the query accuracy.(2)In the current ranking learning algorithm,LambdaMART has proved that it is an excellent algorithm on many occasions.Because LambdaMART is a gradient boost algorithm,which has a problem,that is,the balance between the step size and the number of iterations is not good.If you want to achieve a true global minimum,then the step size will be very small,the number of iterations will be very large,so that the training model will take a long time.Based on this,this paper presents the iLambdaMART algorithm.The model of random forest is used as the initial model of LambdaMART,one is to avoid overfitting problem as far as possible,and it greatly reduces the number of iterations.In the experimental data set Yahoo LTRC and MSLR,the new algorithm in the ERR and nDCG two evaluation indicators are outstanding.
Keywords/Search Tags:Information Retrieval, Learning to Rank, Machine Learning, Random Forest, NNLM
PDF Full Text Request
Related items