Font Size: a A A

Research On Credit Scoring Model Based On Machine Learning

Posted on:2020-01-31Degree:MasterType:Thesis
Country:ChinaCandidate:Y F LinFull Text:PDF
GTID:2428330575952050Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
It is always a hot topic in the field of Internet finance to predict the future credit performance of users by using personal historical loan behavior data.Therefore,taking the historical loan data of a lending institution as an example,this paper builds a credit scoring model and a credit scoring system to predict the default of users and reduce the default risk of users.Firstly,the loan data are pre-processed and the indicators are screened.Data preprocessing mainly solves the problems of data invalidity,high concentration,missing values,outliers and inconsistencies.On the basis of considering the predictive ability of variables and the correlation between variables,the ? value screening method and correlation detection are combined to screen the variables.In the first step,the variables are subjected to WOE binning and the ? value is calculated;in the second step,the variables with the ? value below 0.02 are eliminated;in the third step,the correlation test is performed,the two variables with the correlation coefficient exceeding 0.6 are screened and the ones with higher ? value are retained.Finally,11 variables are selected as credit score indicators.Secondly,credit scoring index is used to establish credit scoring model.The traditional credit scoring model-Logistic regression and XGBoost algorithm with high accuracy are selected to build the model.AUC and KS are used as evaluation indexes to evaluate and compare the model.Empirical results show that XGBoost model(KS = 0.3290,AUC = 0.7181)outperforms Logistic regression model(KS = 0.3129,AUC = 0.7052)in both KS and AUC.Therefore,XGBoost is chosen as the final credit scoring model.According to the ranking of the importance of variables output from XGBoost model,income plays an important role in whether customers will constitute default behavior,at the same time,the amount of the loan or credit card contract,the number of months loan's prepayment,and the level of education also have an important impact.Finally,using the prediction results of the XGBoost model to establish a scorecard,the credit scores are divided into four credit scores from high to low: A,B,C,D.Among them,the default probability of grade D users is more than 50%,which belongs to the high-risk group of default.For such users,they should refuse loans.
Keywords/Search Tags:? value, Logistic regression, XGBoost, Credit score
PDF Full Text Request
Related items