Font Size: a A A

Researches On Information Retrieval Model Based On The Algorithm Of Learning To Rank

Posted on:2019-09-05Degree:MasterType:Thesis
Country:ChinaCandidate:D Y XiongFull Text:PDF
GTID:2428330590965831Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of science and technology in today's society,data information is constantly growing in all fields of life,so more and more attentions have been paid to the effective technology of information retrieval.It is hard to meet people's requirements for high precision retrieval when using traditional retrieval algorithms based on document content or based on importance of the documents themselves.Because of the strong complementarity of these two methods to users,technology of learning to rank comes into being and integrates all kinds of methods as features for feature learning.According to the differences of training samples,algorithms of learning to rank can be divided into three categories: the Pointwise approach,the Pairtwise approach and the Listwise approach.Based on previous researches,studies in this thesis have been carried out on the learning tools of Pairwise and Listwise approaches,as well as the loss functions of Pairwise approach,which includes the following two aspects:The connection weights of neural networks are initialized randomly in the training processing of existing learning to rank algorithms based on neural networks,which is easy to fall into local optimum and will also cost long time for training.In this thesis,a new algorithm of learning to rank is proposed based on multilayer Restricted Boltzmann Machines(RBM).According to the data type of the sample,a real value RBM is chosen as the first layer of the network,and binary RBM is chosen as other layers.The last RBM network only has a hidden layer node,and the sampling activation value of this node is taken as the initial ranking score of the sample.The connection weights of neural networks are initialized by the unsupervised pretraining of multi-layer RBM at the beginning.Then the different loss functions are defined respectively based on Pairwise and Listwise approach.In the end,backpropagation method is used to tune parameters for achieving the optimal model.Compared with existing Pairwise and Listwise algorithms based on neural networks,Experimental results on OHSUMED and MQ2008 datasets show that the method proposed can effectively optimize network parameters,and the accuracy of the final model has been obviously improved.The existing Pairwise methods based on cross entropy have the problems that the loss function of preference document pairs has no fixed boundary,and it cannot lead to a query level calculation,also it pays no attention to documents placed in the front position,which are all inconsistent with the evaluation criteria of information retrieval.Considering the existence of the same labeled document pairs(Ties),a new algorithm based on probability framework is proposed in this thesis,which has a segmented loss function with fixed boundary.Besides,the preference weight is defined according to the relevance difference of different document pairs.Then the mini-batch gradient descent method is used for optimization of the algorithm.Compared with several popular ranking algorithms,experimental results on OHSUMED and MQ2008 datasets show that the method proposed can optimize the parameters effectively,and the precision of the final model has been greatly improved.
Keywords/Search Tags:information retrieval, learning to rank, restricted boltzmann machine, bounded loss, preference weight
PDF Full Text Request
Related items