Font Size: a A A

Study On Learning To Rank And Query Reformulation Based Information Retrieval Model

Posted on:2018-06-22Degree:MasterType:Thesis
Country:ChinaCandidate:W YangFull Text:PDF
GTID:2348330533461368Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the explosive growth of digital information,we urgently need a newer,more dynamic way to find the information,this situation is not only occurs in the Internet field,but also appears in the documents management of government,schools and large companies,because the desktop computer's data storage capacity also increased with the hard disk's storage capacity.Today,these institutions or companies generally search the documents through the database like statement,or build their own knowledge management system on the base of information retrieval tool library,such as Lucene.However,these methods' search function is based on the keyword's simple match,the search results are not ideal.By studying the development process of information retrieval technology,it is found that it is more effective to improve the retrieval effect by means of query reformulation or deep mining of text semantics.In order to improve the effect of information retrieval in these fields,we propose a full-text information retrieval model combining learning to rank and query reformulation.The main research work of this model is as follows:(1)ListGate uses word2 vec and LDA,that can be used to mine text semantics.Word2 vec is used to reformulate the original query in the word vector space,and the topic features generated by the LDA method are integrated into the learning to rank method to improve the search results.The distance between the original query and the reformulated query can be measured by the distance in the semantic space generated by the word2 vec and LDA methods in order to solve the problem of topic drift in the query reformulation.(2)Based on the improvement of ListNet learning to rank method,this paper proposes the List Sum model which can not consider the topic drift.Then,by analyzing the shortcomings of the ListSum model,we further propose the ListGate model which can consider the topic drift.We also give the scoring rules of TFIDF,BM25,language model,LDA topic model,and the algorithm of query reformulation,and the calculation of the topic drift features between origin query and reformulated query.(3)We implement the iterative optimization process of the network weights in the feedback neural networks under the stochastic gradient descent algorithm in Java language,and the experiment is carried out on data sets.The experimental results show that the scoring network with a LDA feature is better than the scoring network which does not have a LDA feature.Moreover,the effect of the scoring function produced with topic drift feature is better than the linear addition of the original query and the reformulated query.It can be seen that the LDA feature and the weights of each query influenced by the topic drift feature play a positive role in the final retrieval effect.
Keywords/Search Tags:information retrieval, learning to rank, query reformulation, topic drift, LDA
PDF Full Text Request
Related items