Font Size: a A A

Research On Training Learning To Rank Algorithm With Heterogeneous Data

Posted on:2017-03-29Degree:MasterType:Thesis
Country:ChinaCandidate:T ZhangFull Text:PDF
GTID:2308330485980611Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Considering that the ranked data is limited and the classified data is infinite and easy to get, we define a new situation th at use both the ranked and classifi ed data as training data to train the ranking algorithm.We propose an algorithm framework of trai ning learing to rank algorithm with heterogeneous data, use both classified and ranked data to train text ranker. In this framework, classified and ranked data is transformed into preference between the pairs of data points, like the Pairwise algorithm that transforms the l earning to rank problem into classification on preference between the pair of data points. So, we can modify pairwise algorithm to solve the problem that training algorithm with heterogeneous data.We use digraph to describe preference intuitively. The Learning to Pairwise rank algorithms are based on the prefer ence between the pair of data and the classified data also contains the preference between positive class and negative class. So, in this situation, we add classified data to ranked data to get m ore preference information, in order to im prove the performance of learning to rank algorithm of Pairwise type.We transform the standard dataset to sim ulate the real situation. In the experiment, we use both ranked and classified data in a gi ven proportion to train RankSVM algorithm which can be used in the new situation, a nd transform the MQ2007, MQ2008 and OHSUMED dataset into heterogeneous dataset. By comparing the performance of algorithm that only uses ranked data as training data with the algorithm that uses heterogeneous data, we can illustrate the expected improvement.The result of the experim ent shows that on dataset OHSUMED the heterogeneous data can improve the algorithm performance on MAP by 12.4% and on NDCG by 22.8%. On dataset MQ2007, MQ2008, the improvement is not so significant.
Keywords/Search Tags:Learning to Rank, Information Retrieval, RankSVM, Pairwise
PDF Full Text Request
Related items