Font Size: a A A

The Research On Learning To Rank Algorithm Based On Topic Similarity

Posted on:2017-03-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2308330485471016Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The occurrences of the search engines greatly improved the efficiency of access to information. But how to sort the information that users care about most and need most to the front from the massive search results is one of the core issues to search engine research. The position optimization for the pages in top position of the search results has considerable research value and commercial value.Learning to rank is a method that use machine learning algorithms to solve the problem of documents sorting, it is useful for document retrieval, collaborative filtering, and many other applications. Based on the research of existing Listwise methods of learning to rank, this article proposed a new method that use the text similarity between documents to improve the scoring function of the original algorithm and further improve the sorting performance of the model. The major contributions of this article include the following three aspects:1) This article proposes a new method for the approach. Specifically, it introduces text similarity between documents as a new metric, which extended the scoring function from query-documents scoring to use documents similarity voting each other. The new metric takes full advantage of the inherent correlation between documents and the characteristics associated with the text, which use a more general and comprehensive perspective to consider the issues of search sorting problems and finally resulting in a more reasonable sort results.2) This article proposed a new model which combined VSM and LDA models to measure the similarity between the text from words and theme. The combination of the two models compensate for their own shortcomings, and improve the calculation results.3) The results show that, The ListSimi algorithm compared with the existing algorithms, performance has been improved on data set OHSUMED and TD2003. ListSimi can significantly improve the accuracy of existing learning to rank algorithms, especially for the front of the documents list. It is crucial for a commercial search engine that returns correct top pages, because the quality of the top pages a search engine returned directly affects the user’s search experience and satisfaction.
Keywords/Search Tags:Information retrieval, Learning to rank, Topic model, Text similarity
PDF Full Text Request
Related items