Font Size: a A A

Domain Adaptation For Learning To Rank

Posted on:2012-12-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:P CaiFull Text:PDF
GTID:1488303341954119Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the widespread use of supervised machine learning techniques in many fields, researchers have considered that the scarce of training data in target domain becomes one of the important reasons which prevent us from quickly deploying a learned model. In re-cent years, how to resolve this problem has became a hot topic in several communities such as machine learning, natural language processing, information retrieval and multimedia.Learning to rank is one of the key problems in information retrieval. To date the techniques based on supervised learning have been regarded as the best choice for learning to rank. However, like traditional supervised learning, we also need to resolve the similar problem in ranking learning, i.e. the lack of training data in target domain. For this purpose, we have investigated on how to effectively use labeled data in related domains to learn a model for target domain, which is referred as domain adaptation.The contributions of this paper include:1. We proposed a framework based on document weighting for learning to rank.. First, we estimate the source document's importance to target domain by domain separator; Then, the weight of document can be transformed to the weight of document pair, which may be integrated into pairwise ranking algorithms.2. We investigated the adaptation problem of RankBoost, a famous ranking algorithm. In the framework of ranking adaptation based on document weighting, we proposed three versions of weight-based RankBoost, and made theoretical analysis and empir-ical comparisons.3. We proposed to estimate the source query's importance to target domain at query level. In learning to rank, the learned object is a query which contains a set of retrieved documents with relevance label. We estimate the query importance from two distinct perspectives:(1) The query can be compressed into a feature vector, and then we use traditional approaches to estimate the query importance. (2) For each source query, we measure the similarity between the source query and each target query, and then combine these fine-grained similarity values to estimate its importance.4. We proposed an algorithm of domain adaptation based on active learning. For obtain-ing target domain-specific ranking knowledge, we adopt active learning techniques to select a small number of informative target queries to label. These queries can provide the domain-specific knowledge which not contained in source domain. Simultaneously, we use these target queries to evaluate the importance weight of source queries such that the source training data can be reused.5. We applied the technique of domain adaptation to semantic entity detection, and proposed to improve the adaptation capability using domain independent features. Traditionally, only short context features are used in entity detection. The perfor-mance is degraded when the genre of test documents is different from that of training documents. To resolve this problem, we designed the framework combining CRF and SVM. The framework can effectively integrate short context and domain independent features, and thus the learned model can adapt well to target domain.In this thesis, we studied the domain adaptation problem in learning to rank under dif-ferent scenarios. When labeled data are unavailable in target domain, we investigated the weight-based adaptation for learning to rank from the instance weighting point of view. When target domain is allocated limited budget, we explored the ranking adap-tation based on active learning technique. Additionally, we also studied the application of domain adaptation to semantic entity detection. From the perspective on feature, we explored the adaptation of semantic entity detection using domain independent features. The effectiveness of all proposed algorithms are evaluated on several benchmark dataset.In real applications such as multimedia news recommendation, hot event detection, sentiment analysis, web search, vertical search and so on, with the proposed domain adaption algorithms, existing labeled data in related domain can be effectively used to learn a model for target domain, such that the label cost may be saved.
Keywords/Search Tags:Domain Adaptation, Learning to Rank, Document Weighting, Query Weighting, Active Learning, Domain Independent Feature, Semantic Entity
PDF Full Text Request
Related items