Font Size: a A A

Research Of Transfer Learning Algorithm Based On Sample Feature Distance

Posted on:2018-02-20Degree:MasterType:Thesis
Country:ChinaCandidate:H R DuanFull Text:PDF
GTID:2428330518958871Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In traditional machine learning,the learning task is to learn a classification model on given sufficient training samples.However,in some new domains,it is difficult to obtain sufficient training samples.Besides,the traditional machine learning assumes that the training samples and testing samples must have the same distribution.However,in many real-world applications,this assumption may not hold.For example,the training samples changes over time.It will be costly to annotate the training samples again.Next,it is also very wasteful to completely discard existing training samples even if their distributions are significantly different.The transfer learning is to accomplish the learning tasks in the target domain using transferable knowledge in related domains,to achieve the purpose of reducing the amount of labeled dataset required and improving the performance of learning in target domain.The transfer learning based on samples is to accomplish the learning tasks in the target domain by selecting a subset of the samples in the relevant source domains and transferring them.However,the existing transfer learning algorithms still are imperfect in the sample weight,utilizing unlabeled sample in the target domain and the base classifiers obtained by each iteration.For these questions,this thesis discusses three aspects in the following.Firstly,this thesis improved the TrAdaBoost algorithm for the single source data.This method updated the sample weights in the source and target domains according to the error of the base classifier obtained by each iteration.Experimental analysis demonstrated the effectiveness of the algorithm.Secondly,this thesis proposed a multisource transfer learning algorithm based on sample feature distance.This thesis defined the feature distance from the sample to the domain,calculated feature weight based on the covariance between the source domain and the target domain.Based on the feature distance,the samples in the source domain are selected according to the corresponding weights.Moreover,this thesis improved the iterative process by the dynamic factors.In this algorithm,unlabeled samples in target domain and the base classifiers was fully utilized,and the weight drops too fast was further improved.Furthermore,this thesis conducted a series of experiments to evaluate the potential applicability of the proposed method on Letter-recognition and 20newsgroup datasets,including classification accuracy,time efficiency and overall performance impacted by selecting sample size in original area,the number of original area,and target samples respectively.Experimental analysis demonstrated that the method proposed had some improvement compared with existing methods.Thirdly,the multisource transfer learning algorithm based on sample feature distance was applied to cross-domain sentiment analysis.Experimental results showed the feasibility of the proposed algorithm.
Keywords/Search Tags:Transfer learning, Multisource transfer learning, Sample feature, Distance
PDF Full Text Request
Related items