Font Size: a A A

A Study On Cross-domain Classification And Its Application

Posted on:2015-04-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:X GuFull Text:PDF
GTID:1488304313952679Subject:Light Industry Information Technology and Engineering
Abstract/Summary:PDF Full Text Request
The information revolution brings the rapid development of information technology,machine learning as a learning method has been widely used in various fields of the society. Itis very important to extract useful information from the mass of complex information in datamining, data correction, data prediction and other fields. But with the development of researchand application, the traditional machine learning methods appear various limitations whichaffect the system recognition rate and recognition speed.One of the important reason is thatthe traditional machine learning have a poor domain adaptation in the face of cross domainlearning and multi domain learning. This is because the traditional machine learning methodsassume that the distribution of training data and testing data are same, but this assumption isoften violated in the real world.When the above methods are applied to cross domains theywill bring a series of problems such as: the training data and model expired, classifier bias?high cost of labeled samples,and poor robustness to noise will heavily degrade theirclassification performance.These problems reduce the accuracy and efficiency of machinelearning.Cross domain learning not strictly required training data and testing data must meet thesame data distribution,they can transfer knowledges in different data distribution domains.Therefore cross domain learning can accelerate new task learning and minimize the impact ofdata distribution through past learning experience and the relationship between different tasks.Today cross domain learning attracts a lot of attention, also made a series of achievements.Butthere are still many problems of cross domain learning, when we observe and study thecurrent results.These problems include how to solve low recognition rate which caused bydifference data distribution between source and target domains, how to improve the domainadaptation, how to solve the problem of privacy preserving and data interference, how toachieve cross domain learning through multisource domains or multi-task, how to solve theproblem of imbalanced domains, how to complete fast learning for large datasets multidomains and so on. In order to solve the above problems, this paper carried out the followingresearch:(1) From the perspective of improving Domain Adaptation, the algorithmMEB-DA(Minimum Enclosing Ball for Domain Adaptation) is proposed here. In orderto achieve the rapid calculation for large datasets, the novel algorithm namedCCMEB-DA(Center Constrained Minimum Enclosing Ball for Domain Adaptation) isalso proposed here. By calculating the center of each dataset, we can correct thedataset?intrusion detection or identify the similarity of data between different domains.As a validation we test it on the fields of “WIFI(Wireless Fidelity) indoor positioning” and “Face Detection” and the obtained experimental results indicate the effectivenessand domain adaptation of the proposed algorithm. Different domains can be quicklyoverall classification.(2) From the perspective of minimizing the maximum distfibution distance among differentdomains?we integrate the MEB (Minimum enclosing ball) algorithm together withParzen windows probability estimation to develop a new transfer learning methodnamed MEBTL (Minimum enclosing ball Transfer learning). We also uses CVM (CoreVector Machines) theory to develop its fast version of the proposed algorithmCCMEBTL for large data set.We first calculate the center of the target domain,throughcomparing the probability estimates method to determine the degree of differencebetween source domain and target domain,so as to find the internal relations betweendifferent domains and completed the transfer learning in large data set.(3) From the perspective of large data set,anti jamming,and privacy protection, SVM andCCMEB are combined with probability distribution theory to formulate a novel domainadaptation approach (CCMEB-SVMDA). Support vector machine (SVM) attempts tofind an optimal separating hyperplane for binary-classification problems inhigh-dimensional space. CCMEB proposed by I Tsang in, as an improvement of theCVM, is particularly suitable for training on large datasets?By calculating the Similarityof each dataset we can correct the dataset or Classification the data between differentdomains.The algorithm have strong feasure of anti disturbance,we can eliminate theharmful sample misleading and improve classification accuracy by enhance thesimilarity between source domain and target domain.(4) From the perspective of cross-multisource learning, in this part, based on the logisticregression model and the proposed concensus measure, a multi-source cross-domainclassification algorithm MSCC was proposed to realize the effective cross-domainclassification for target domain. In order to enable MSCC to work well for largedatasets, based on the algorithm CDdual as the recent advance about large-scale logisticregression, MSCC's fast version MSCC-CDdual for large datasets is derived andtheoretically analysed. The experimental results on indicated that the proposedalgorithm MSCC-CDdual has a fast speed,high classification accuracy and gooddomain adaption for large cross-multisource datasets.(5) From the perspective of imbalanced domains and multi-task learning,a novel fastcross-domain classification method is proposed here. We introduce the multi-taskcoupled logistic regression and MAP framework called MTC-LR, which is a newmethod for generating each classifier for each task, capable of sharing the commonalityamong multi-task domains,and this feaure is help to slove the problem of overfitting orunderfitting caused by imbalanced domains.The basic idea of MTC-LR is to use all individual logistic regression based classifiers, each one appropriate for each taskdomain learning all the parameter vectors of all individual classifiers by using theconjugate gradient method, in a global way and without the use of kernel trick, andbeing easily extended into its scaled version. We can easily integrate it with astate-of-the-art fast logistic regression algorithm called CDdual to develop its fastversion MTC-LR-CDdual for large multi-task datasets. Our experimental results onartificial and real datasets indicate the effectiveness of the proposed algorithmMTC-LR-CDdual in classification accuracy, speed,anti jamming,anti unbalance androbustness.
Keywords/Search Tags:Cross-Domain, Classification, Minimum Enclosing Ball(MEB), LogisticRegression(LR), Multi-Task, Support Vector Machine (SVM)
PDF Full Text Request
Related items