Font Size: a A A

Application Of Random Forest In Microfinance

Posted on:2019-12-13Degree:MasterType:Thesis
Country:ChinaCandidate:D W WangFull Text:PDF
GTID:2428330566977576Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
This paper mainly deals with the classification and prediction of person to person(P2P)loan data.The amount of processed data reaches 100,000 samples,100,000 rows of data,20 columns of original data,and 104 columns of cleaned data.This paper mainly adopts the random forest algorithm to classify and compare with logistic and support vector machine(SVM).The random forest algorithm belongs to one kind of integrated classifier.This paper adopts bootstrap non-replacement sampling method of random forest and integrates multiple decision trees.The random forest algorithm belongs to one of a variety of integrated learning algorithms.Integrated learning can consist of two classifiers: “heterogeneous” and “homogeneous”.Random forest integration of multiple decision trees This method is called “homogeneous” integration.This article refers to random forest logistic,and SVM integration as "heterogeneous" integration.Integrating learning by integrating several single learners into a set model,you can often achieve better predictive results than a single learner.This article uses the ensemble learning model for the P2 P credit model and predicts whether customers will be overdue in the future based on customer information.Two models will be discussed.One is the random forest model,and the other is the blending of random forest,logistic,and SVM.First,in the random forest classification algorithm,the data is up-sampled.The proportion of normal repayment data in P2 P loan data accounts for more than 80%,overdue data only has more than 10%,and there is an unbalanced ratio in the classification data.The random forest directly processes these data,and the prediction accuracy will not reach very high.Good effect to prevent this phenomenon from happening.Smote upsampling processing will copy the overdue data to achieve the effect of overdue data proportion,so as to accurately predict.Second,the fusion of three models.Random forests have achieved a good effect on loan data alone.They try to merge two models with good effects(logistic and SVM).The three are combined with weighted methods to increase the model effect.
Keywords/Search Tags:Random forest, logistic, bootstrap, Support vector machine, model fusion
PDF Full Text Request
Related items