Font Size: a A A

Top-k Learning To Rank Based On Similarity Between Documents

Posted on:2015-08-07Degree:MasterType:Thesis
Country:ChinaCandidate:X XiaoFull Text:PDF
GTID:2298330422990413Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Now, with people’s growing demand for information on the Internet, to get theinformation accurate and efficient from search engine has become a hot researchissue. Around those above, sort technology has become a vital part in search engine.In order to make customers’ satisfaction improved, it is certainly necessary toimprove the accuracy of the results returned, in another word, the most relevant pageshould be returned to the user. How to achieve this goal will become a researchfocus on the search engines, in recent years the most popular way is to focus thesearch engines sorting process uses machine learning methods to study and solve,which is due to factors that affect the sort results very complex, these factors aretaken into account is bound to get a more reasonable sort results. It is the method ofLearning to Rank.In practical applications such as information retrieval, recommendation systemor the calculation of advertising, for most users, the main concern is compared bysome results in front, that is to say, by some results for the user experience andsatisfaction, these front results are crucial. Thus, a Top-k ranking method isproposed to solve the requirements above.We improved the model based on previous model, first we added the correlationinformation between documents, thus, the model considering the relevance betweenthe documents in the process of Top-k data modeling and the documents are notmutually independent, but related. We will correlation between documents as eachdocument scoring weighted to scoring, after adds the documents correlation. It canmake use of some additional information and the results can be improved.When the correlation between the documents added the new model obtained, theproposed method directly using the maximum probability of sort to optimize themodel does not use minimizing the loss function, the result of this is that thecomputation of training model is greatly reduced, from combination level topolynomial level.Then we attempted the innovative Top-k model since the original Top-k modelhas a lot of documents should be at first k position but they are wrong at the positionafter k in the first layer of the process. So no matter how the second layer processalgorithm complicated, or the use of additional information, the sorting effect onoverall Top-k does not improve, based on this. We increased K of first layer in theprocess, but k still small compared to the related document collection N, so thedocuments really in first k positions increased, and when we use complex algorithms for sorting at second layers, the accuracy is greatly improved.
Keywords/Search Tags:machine learning, learning to rank, vector space model, similaritybetween documents, NDCG
PDF Full Text Request
Related items