Research Of Random Forest Transfer Learning Based On Instance

Posted on:2019-05-02

Degree:Master

Type:Thesis

Country:China

Candidate:D Li

Full Text:PDF

GTID:2428330572960747

Subject:Computer Science and Technology

Abstract/Summary:

Traditional machine learning requires a large amount of training data,and requires that the test data and training data must be satisfy the same distribution.However,in many practical applications,it is difficult to meet this requirement.For example,training a classification model for a new target task often does not have sufficient training data,while training data in another similar field is sufficient.If old data that do not satisfy the same distribution assumption are applied to a new field in a reasonable manner,it is possible to avoid heavy labor of tagging data.In this case,transfer learning can transfer knowledge from existing data to new areas and help train new models.Transfer learning is generally based on existing machine learning algorithms such as decision trees and boosting.On the basis of analyzing and summarizing the work of the predecessors,this paper uses its method of handling problems to apply to the random forest,and realizes two transfer learning methods.These two methods are based on the instance of transfer learning:(1)Random forest transfer learning based on information gain.The source and target domain samples are trained simultaneously,and the optimal parameters of the classification function are obtained by the mixed information gain at the classification node.Using the Mahalanobis distance to evaluate the distance between the leaf nodes,the class labels of the leaf nodes containing the target domain training samples are passed to the leaf nodes of only the source domain training samples,thereby predicting the sample classification results.Finally,compared with other methods on the MNIST data set to verify the effectiveness of the transfer learning.(2)Random forest transfer learning based on covariate shift.Equally selected samples from the source and target domains are randomly selected to generate forests,known as candidate forests,and the output of the two forests,candidate forests and random forests generated by the source domain training,is used to estimate the loss of the covariates between the two domains.This covariate loss is used to evaluate the distance between the source and target domain samples and iteratively weights the source domain samples.Until the candidate forest generates enough decision trees,select a part of the decision tree to generate the final transfer random forest.Finally,experiments on INRIA and Daimler Mono data sets to verify the effectiveness of the transferring.

Keywords/Search Tags:

Transfer Learning, Random Forest, Information gain, Covariate loos

Related items

1	Research On Key Technologies Of Personal Behavior Prediction Based On Random Forest
2	Research And Application Of Random Forest Technology In News Page Classification Systems
3	Research Of Data Integration Based On Random Forest
4	Research Onsoftware Fault Prediction Method Based On Transfer Learning And PU Learning
5	Visual Interpretation And Analysis Of Random Forest
6	Transfer Learning For Bayesian Network Parameter
7	Class-Imbalanced Data Stream Classification Method Based On Adaptive Random Forest
8	Research On ELM Image Classification Combining HOG And Random Forest
9	Decentralized Vertical Federated Learning Based On Random Forest
10	Application Of Learning-to-rank Method Based On Random Forest In Self-made Dataset