Font Size: a A A

Knowledge Transfer For Cross Domain Learning To Rank

Posted on:2011-10-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:D P ChenFull Text:PDF
GTID:1118360305966708Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the quick development of Internet, the way people can effectively access information becomes crucial. Search engines partially solve this problem and act as the main access of Internet. Recently, to acquire ranking models for search engines has become a research hotspot. In search industry, the quality of the top results much influences the user satisfaction which is strongly related with the market share. This dissertation uses search engines as main scenario.Most of the mainstream search engines present the ranked search results in a list. The top results are thought to be the most relevant ones, and thus meeting the users' information needs which are expressed via queries. In recent years, large-scale data processing and machine learning techniques have been widely applied in acquiring a ranking model for the search engine, creating a new term "learning to rank". Many methods have been proposed and some have been successfully implemented in industry (e.g. Ranking SVM). Almost all these methods can be categorized to "supervised learning". To obtain a reliable ranking model, we need to label a large amount of training data, input them to a specific learner and experiencing a training period.In the real world application of learning to rank, the labeled data are usually in shortage or even absent. Using existing learning to rank methods, we cannot ensure the reliability and generalization ability of the learned ranking model without sufficient training data, which restricts the use of a specific method in real world. Fortunately, except the labeled data in target domain, we may get additional labeled data from a different but related domain which we call source domain. This dissertation focuses on how to exploit these source domain data to help improve the learned ranking model in target domain.To solve the lack of labeled data in learning to rank, we make use of the source domain data, introduce the transfer learning technique and propose a novel problem "cross domain learning to rank". After carefully studying the assumption, loss function, optimization equation and learning algorithm, we conduct cross domain learning to rank at instance level and feature level. Finally, we study the application of our proposed methods in document retrieval and vertical search. For the instance-level transfer rank learning, we first propose a heuristic method TransRank, which first preprocesses the labeled data from source domain and then input them together with the target domain data into the learner to get a ranking model. Following this, we propose an improved probabilistic method CLRankins. For the feature-level transfer rank learning, based on the assumption, we formalize a unified optimization problem, and convert it to a process where two variables are optimized iteratively. Moreover, we study its relationship with Ranking SVM and prove that this optimization problem can be solved by using Ranking SVM as the basic learner. We name this method as CLRankfeat.Cross domain learning to rank has great perspective in document retrieval. In this dissertation, we use several benchmark datasets and validate our methods through experiments. The results show that all the three methods can improve the ranking performance in the target domain. In particular, CLRankfeat gains 5-15% performance improvement on all the experimental datasets, while TransRank and CLRankins only can achieve limited improvement on partial datasets. We further compare them in sensitivity and robustness.Vertical searches are another important scenario for cross domain learning to rank. There is no enough time to learn a high-quaiity ranking model for a newly developed vertical search product. However, we can exploit the labeled data from an existing vertical search and help the new one to construct its own ranking model. In the experiments, we use the click-through log data from a commercial search engine, extract rank-related features and set up datasets. Experimental results show that TransRank gets better ranking performance in news search, saving 80% human labeling effort. Furthermore, we analyze and discuss how different features act in the learning process.
Keywords/Search Tags:learning to rank, transfer learning, Ranking SVM, vertical search engine, document retrieval, optimization problem, data preprocessing
PDF Full Text Request
Related items