Font Size: a A A

Research On Prediction Of Personal Credit Default By Improved Random Forest Model

Posted on:2021-04-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y L TaoFull Text:PDF
GTID:2439330620463708Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
The increasing progress of China's economy has led to an increase in people's consumption levels and has also put pressure on competition among various industries.For financial institutions such as the banking industry,if they want to continuously improve their overall strength,they should vigorously develop loan business while maintaining traditional banking business,continuously improve the management of loan default forecasts,improve the security level of loan business,and operate efficiently.Funding brings more economic benefits.For banks and other financial institutions,loans are the backbone of asset operations,so this puts higher requirements on the accuracy of customer loan default predictions,because even if the model prediction error is only 0.01%,it may be possible for banks and other financial institutions.This brings hundreds of millions of losses,so this article takes personal credit as the research object and uses an improved random forest model to predict whether personal credit will default,which can provide a reference for the credit prediction of banks and other financial institutions.This paper compares and analyzes the three currently used prediction models: decision tree,logistic regression,and random forest.The research finds that the decision tree model will cause overfitting problems,and it will also blur the correlation between variables and predict.The effect is poor;Logistic regression has certain limitations in processing a large number of high-dimensional data;and random forest is a highly applicable model that can efficiently mine the data with a good degree of fitting effect and can handle a large number of high-dimensional data.The dimension data has better applicability to personal credit data.However,in the process of generating the random forest model,many decision trees with poor classification results will be generated,and these decision trees will reduce the prediction accuracy of the final random forest model.Therefore,this paper introduced improved random forest model for the credit prediction.The improved random forest algorithm is to introduce the value of AUC into the process of improving the random forest model by analyzing the characteristics of personal credit data.Calculate the AUC value of each decision tree in the original random forest,as the classification accuracy of a single tree,and then sort according to the value of the AUC,from large to small,after multiple experiments according to the number of different decision trees The accuracy rate selects a good classification tree to form an improved random forest model,which solves the problem that the random forest model affects the prediction effect of theentire random forest model due to the poor classification effect of some decision trees and improves the accuracy of the model.Finally,the improved random forest model is compared with the original model,Logistic regression,and decision tree from the perspective of the ROC curve and the value of AUC.The results show that the improved random forest model in this paper has the highest prediction accuracy.The improved random forest algorithm is suitable for personal credit evaluation.
Keywords/Search Tags:Loan Default, Random Forest, Decision tree, Logistic Regression, AUC Evaluation Index
PDF Full Text Request
Related items