Font Size: a A A

Research Of Transfer Learning And Its Application In Classifying Cross-domain Data

Posted on:2012-02-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:J W QinFull Text:PDF
GTID:1488303356993189Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The rapid development of information technology enable people to obtain more and more information, and how to effectively learn knowledge from the message becomes increasingly important. As one of the important means of data mining and knowledge discovery, machine learning attracts a lot of attention. However, the performances of machine learning approaches encounter the bottleneck with being increasingly used in the practical applications. One of the most important reasons is that many approaches in machine learning are based on a strict assumption which requires the training data and test data are drawn from the same underlying distribution. The assumption brought many problems, such as the expiration of the training data, the expiration of the trained model and the expensive cost of labeling training data and so on, which reduced greatly the efficiency of data analysis. Distribution bias between the training and test data often occurs in the practical applications, which is usually ignored in the research.Transfer relaxes the assumption that the training and test data should come from a same distribution. It uses the data in a different domain to help the target task to effectively solve the learning problem in which the training and test data are of different distributions. At present, transfer learning attracts a lot of attention and achieves plentiful success. Yet, we found some problems through investigating the current approaches, which include how to perform instance transfer when there is no labeled data in the target domain, how to avoid the difficulty in evaluating the parameters of domain distribution, how to control negative transfer in the transfer process when domains are very different, and how to balance the generalization and adaptation of the transfer learning algorithms.Focuses on the problems aforementioned, this dissertation investigates the problems of how to build transfer environment and how to construct classification models in the transfer setting by means of instance transfer and feature transfer based on the analysis and summary of the existing approaches, aiming at how to utilize the data in the relative domains and improving the classification accuracy of the target task. The main contributions of this dissertation are as follows:1. An approach called Multi-step Bridged Refinement for Transfer Learning (MSBR) based on dynamic dataset is proposed. This approach solves the problem of reusing data in the training data when there is no labeled data in the target domain and achieves the goal of effectively transferring discriminative knowledge by means of decomposing the original task into several subtasks based on the bridged between the source and target data which is constructed by the mixed dataset with dynamic distribution varying from the source data to the target data.2. To solve the difficulty of the parameter evaluation of domain distribution when labeled data in the target domain is few, an approach called Revised Embedding based Transfer Learning (RETR) is proposed for choosing data from the source domain by using a handful of labeled data combining with a large amount of unlabeled data in the target domain. When labeled data is few in the target domain, the unlabeled data are used for the construction of the embedding space in which the source data are mapped and filtered. Then, an iterative optimization process is designed to revise the embedding space by using the winners in the source data and influence the partition structure of the target data by the discriminate information of the source data.3. Proceed from the eigenspace, the underlying structure of data is investigated and an approach called Feature Alignment based Transfer Learning (FATL) is proposed. The transfer problem returns to the standard machine learning problem by solving a common space with constraints to align the source and target data in the feature space, which eliminates the distribution gap between data and satisfies the traditional assumption that requires the training and test data should come from an identical distribution.4. An idea of unified optimization of instance transfer and feature transfer is proposed. In view of that the instance transfer methods have good adaptations as they optimize according to the specific character of the target domain and the feature transfer methods have good generalizations as they consider the common character among domains. To improve the practicality of the transfer methods, diversity and similarity are taken into account and a transfer approach called Transfer with Instance Level Constraints and Feature Level Relationship (TICFR) which introduces a new optimization function by combining the constraint conditions that converted from the instance level and feature level information is proposed. Experimental comparisons show the instance level constraints and feature level relationship based transfer method balances the generation and the adaptation in the transfer process.5. In view of the negative transfer situation in many transfer problems, the measurement of similarity between domains is investigated quantitatively and the means that could avoid negative transfer is proposed by introducing the transfer risk into the transfer process to combine with TICFR. Through being applied in the cross domain data classifying problem where the distributions of domains are very different, the transfer risk based transfer method can avoid negative transfer and control the influence of adverse factor from the source data.
Keywords/Search Tags:Instance Transfer, Feature Transfer, Cross Domain, Classification
PDF Full Text Request
Related items