Font Size: a A A

Research Of Random Forest Transfer Learning Based On Instance

Posted on:2019-05-02Degree:MasterType:Thesis
Country:ChinaCandidate:D LiFull Text:PDF
GTID:2428330572960747Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Traditional machine learning requires a large amount of training data,and requires that the test data and training data must be satisfy the same distribution.However,in many practical applications,it is difficult to meet this requirement.For example,training a classification model for a new target task often does not have sufficient training data,while training data in another similar field is sufficient.If old data that do not satisfy the same distribution assumption are applied to a new field in a reasonable manner,it is possible to avoid heavy labor of tagging data.In this case,transfer learning can transfer knowledge from existing data to new areas and help train new models.Transfer learning is generally based on existing machine learning algorithms such as decision trees and boosting.On the basis of analyzing and summarizing the work of the predecessors,this paper uses its method of handling problems to apply to the random forest,and realizes two transfer learning methods.These two methods are based on the instance of transfer learning:(1)Random forest transfer learning based on information gain.The source and target domain samples are trained simultaneously,and the optimal parameters of the classification function are obtained by the mixed information gain at the classification node.Using the Mahalanobis distance to evaluate the distance between the leaf nodes,the class labels of the leaf nodes containing the target domain training samples are passed to the leaf nodes of only the source domain training samples,thereby predicting the sample classification results.Finally,compared with other methods on the MNIST data set to verify the effectiveness of the transfer learning.(2)Random forest transfer learning based on covariate shift.Equally selected samples from the source and target domains are randomly selected to generate forests,known as candidate forests,and the output of the two forests,candidate forests and random forests generated by the source domain training,is used to estimate the loss of the covariates between the two domains.This covariate loss is used to evaluate the distance between the source and target domain samples and iteratively weights the source domain samples.Until the candidate forest generates enough decision trees,select a part of the decision tree to generate the final transfer random forest.Finally,experiments on INRIA and Daimler Mono data sets to verify the effectiveness of the transferring.
Keywords/Search Tags:Transfer Learning, Random Forest, Information gain, Covariate loos
PDF Full Text Request
Related items