Font Size: a A A

Research On Personal Credit Scoring Mixed Model Based On Random Forest And Logistic Regression

Posted on:2021-09-02Degree:MasterType:Thesis
Country:ChinaCandidate:L Y YangFull Text:PDF
GTID:2480306248455694Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
Personal credit consumption is becoming more and more important in life.Financial institutions usually use the personal credit scoring model to calculate customer credit score and evaluate customers'ability to repay loans,so as to determine the lending quotas and reduce the incidence of bad loans.A personal credit scoring model with good prediction performance is not only helpful for financial institutions to prevent against bad loans,but also beneficial to promoting personal credit consumption and improving the consumer market.German Credit Dataset was used in this analysis.Random forest and Logistic regression were employed to build a personal credit scoring mixed model.Given great learning capacity,the random forest feature selection algorithm was firstly used to sort the importance of the feature variables.The selected importance-rated variables were then enrolled into Logistic regression which is regarded to have good robustness and interpretability.Finally,the mixed model and three independent models(decision tree,random forest and logistic regression)was compared for predictive capability including accuracy,robustness and interpretability.The results showed that random forest model had the highest prediction accuracy among the three independent models,followed by Logistic regression and decision tree(76.0%,73.5%and 71.0%for the test data respectively).Compared with decision tree and logistic regression,random forest had greater learning performance and can more accurately divide the samples into right credit subset.Logistic regression model was the best and random forest was the worst in terms of robustness which was indicated by the absolute difference between the prediction accuracy for training set and test set.Random forest-Logistic regression based mixed model used fewer feature variables than these independent models.The sensitivity,specificity and prediction accuracy of the mixed model were 67.16%,78.20%and 74.50%,respectively for test data,which were higher than those of logistic regression model;the mixed model also had high robustness and interpretability.The findings suggest that it is feasible to combine different prediction models to complement on each other and improve predictive capability.
Keywords/Search Tags:Personal Credit Scoring, Random Forest, Logistic regression, Decision tree
PDF Full Text Request
Related items