Font Size: a A A

Research On Search Ranking Algorithm Based On Machine Learning

Posted on:2021-02-15Degree:MasterType:Thesis
Country:ChinaCandidate:R ChengFull Text:PDF
GTID:2428330614965905Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of computer and Internet technologies,human society has entered an era of information explosion.People have to face massive amounts of information every day.The needs of users have also changed from acquiring information to efficiently acquiring effective information.Under this background,the continuous optimization of information retrieval technology is also very important.Machine learning,as an emerging technology,has been widely used in all aspects of life.Combining machine learning and information retrieval technology is an inevitable trend.The method produced by the combination of the two is called the learning ranking method.Traditional retrieval techniques cannot excavate the correlation of features in the case of complex information,but the learning ranking method uses the characteristics of machine learning to learn autonomously,which can well represent the correlation of complex features.According to the different processing of documents,the learning ranking algorithms can be divided into three categories: pointwise,pairwise,and listwise.This thesis aims to study and improve the representative algorithms of the latter two types of algorithms,Rank Net algorithm and Lambda MART algorithm.The loss function has always been the key to learning the ranking algorithm.It can be used to measure the degree of inconsistency between the predicted value and the true value of the model.Its advantages and disadvantages directly affect the performance of the algorithm.The research work of the thesis mainly includes the following three aspects:(1)The thesis studies the development history and research status of search ranking algorithms in the field of information retrieval,and gives a brief description of the framework of ranking learning systems.The classification and evaluation indexes of ranking learning algorithms are studied in detail,which will pave the way for the subsequent algorithms of researching and improving.(2)This thesis proposes a Rank Net algorithm that improves the loss function,the new loss function that is a linear combination of the level loss function(cross entropy)and the point level loss function(Huber)to measure the loss predicted by the model.The original Rank Net algorithm used pairs of document as training samples,only considering the partial order relationship between the document pairs,ignoring the correlation between the document itself and the query,andcorresponding to the loss function,only the cross-entropy loss function was used.To solve this problem,a Huber loss function is added to the cross-entropy loss function to measure the prediction error of a single document itself.The improved loss function can make the model's prediction effect better.Then,this thesis uses BP neural network to build a simulation platform and use gradient descent method for training.The simulation results show that the improved Rank Net algorithm has higher accuracy than the original algorithm.(3)In this thesis,the basic principle of Lambda MART algorithm is studied.Firstly,two components of this algorithm are introduced in detail: gradient of Lambda Rank algorithm and MART algorithm.Secondly,the advantage of Lambda MART algorithm as a listwise method is using negative gradient with evaluation index as the object of each iteration fitting.In this thesis,the Lambda MART algorithm is simulated and compared with the Lambda Rank algorithm.The experimental simulation results show that the Lambda MART algorithm has different effects under different numbers of decision trees.An excessive number of decision trees will lead to overfitting,and The more documents the algorithm care about,the value of NDCG are more greater,the ranking effect are better.Comparative experiments show that the Lambda MART algorithm has a better ranking effect than the Lambda Rank algorithm under the evaluation index of NDCG @ 10(normalized cumulative damage gain).
Keywords/Search Tags:Information retrieval, ranking learning, loss function, gradient, neural network
PDF Full Text Request
Related items