Font Size: a A A

A Study On Domain Adaptation Algorithm And Its Application

Posted on:2015-08-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:M XuFull Text:PDF
GTID:1488304313453304Subject:Light Industry Information Technology and Engineering
Abstract/Summary:PDF Full Text Request
Traditional machine learning algorithms assume that the training data and the test dataare drawn from the same distribution. Models that are purely trained from the training data areapplied directly to the test data. Unfortunately, in practice this assumption is often too strong.Given that the instances in the two domains are drawn from different distributions, traditionalmachine learning can not achieve high performance on the new domain. Therefore domainadaptation algorithms are designed to build a bridge between the training data and the testdata in order to improve the performance of the test domain prediction and these algorithmsare widely used to solve real-world classification, regression, probability density estimationproblems in machine learning problems. Currently, many experts and scholars conductin-depth study in the field of domain adaptation, obtain a number of important research resultsand widely applied them in the actual production. However, there are still many issues whichneed further exploration and research. Several issues are addressed in this dissertation aboutdomain adaptation from four aspects of probability density estimation, support vector domaindescription, classification and regression. The main contents are as follows:1. This dissertation proposed a novel domain adaptation algorithm which based onminimum enclosing ball. For many machine learning problems, the incomplete data collectionwould lead to low prediction performance, which arises the issue of domain adaptation. Basedon the theory that many kernel methods such as support vector domain description (SVDD),support vector machine (SVM) and support vector regression (SVR) can be equivalentlyformulated as minimum enclosing ball (MEB) or center-constrained minimum enclosing ball(CC-MEB) problems in computational geometry, novel algorithms are proposed. In order tosolve the problem that how to effectively transfer the knowledge between the two fields, thenew theorem is revealed that the difference between two probability distributions from twosimilar domains only depends on the centers of the two domains' minimum enclosing balls.Based on these claims, fast adaptive algorithms are proposed for large domain adaptation.These proposed algorithms use the center of the source domain's MEB or CC-MEB tocalibrate the center of the target domain's in order to improve the machine learningalgorithms' performance of the target domain. Experimental results show that these proposeddomain adaptive algorithms can make up for the lack of missing data and greatly improve theperformance of the target domain's machine learning problems.2. A novel transfer learning algorithm based on SVM was proposed in the dissertation.When task from one new domain comes, relabeled the new domain samples costly and itwould also be a waste to discard all the old domain data. A novel algorithm TL-SVM basedon SVM algorithm was proposed. The main idea of this algorithm is that SVM classifier iscomposed of (w, b). If two domains are related, the values of w about the two domains'classifier respectively should be similar. We can build a high-performance classificationmodel by using a small amount of the target domain's samples and the knowledgew sof thesource domain to accomplish the transfer learning between two domains. The method inheritsthe advantages of the maximum interval SVM based on empirical risk minimization and makes up for the defects that traditional SVM can not migrate knowledge.The above theoretical results can be further applied to L2kernel classifier which based onthe concept of the difference of density. L2kernel classifier has good classification effect andsparsity, however, the premise that the training domain and testing domain are independentand identically distributed severely constrains its usefulness. In order to overcome thisshortcoming, under the premise that L2kernel classifier is equivalent to a deformation SVMand knowledge can transfer through its equivalent deformation SVM. So a novel classifiernamed transfer learnging-L2kernel classification (TL-L2KC) is proposed. This classifier candeal with the problem that training set and test set distribution inconsistencies which causedby dataset's changing slowly or training set obtained in a specific constraints. And at the sametime the algorithm can inherit the good performance of L2KC.3. Reduced set density estimation (RSDE) algorithm provides a kernel based densityestimator which employs a small percentage of the available data sample and is optimal in theL2sense. This method provides a reduced set density estimator with comparable accuracy tothat of the full sample Parzen density estimator and demonstrates a nicer performance in thecomputational time, but it can not work well when the training set and the testing set are notindependent and identically. In order to achieve the above goal, a novel A-RSDE is proposedfor adaptive probability density estimation by making full use of the source domain's (trainingdataset)knowledge p (x;?1)of the probability density distribution, which lets the targetdomain's (testing dataset) probability density estimation q (x;?2)be closer to the trueprobability density distribution q(x). Meanwhile, the fast core-sets based minimum enclosingball (MEB) approximation algorithm is introduced to develop the proposed algorithmA-FRSDE.The above RSDE, A-RSDE algorithms can be viewed as the probability densityestimation in a linear combination space of densities. It is introduced to develop itsapproximation framework based on a linear combination of Gaussian basis functions underintegrated square error criterion. The proposed approximation framework has threeadvantages. Firstly, it can directly estimate the probability density function of the linearcombination space of densities without having to estimate the probability density function ofeach domain, and it has at least comparable to or even better approximation accuracy thantraditional density estimation methods. Secondly, the time complexity of the proposedapproximation framework is, since l is generally much less than the sample size, hence it isvery suitable for large datasets. Thirdly, this proposed framework can be typically used todevelop alternative approaches to classification, data condensation, justification of theindependence between random variables, conditional density estimation and the similarityidentification between multiple source domains and a target domain. If the linear combinationspace of densities is used to approximate a known space, it can be applied to estimate thesource domain and the target domain approximation for multi-source domain adaptivelearning.
Keywords/Search Tags:domain adaptation, minimum enclosing ball(MEB), core set, support vectordomain description(SVDD), support vector machine(SVM), L2Kernel Classifier(L2KC), reduced set density estimation(RSDE), liner combination space of density
PDF Full Text Request
Related items