| In recent years,due to the rapid development of China’s economy,the significant improvement of residents’living standards and the shift in people’s consumption attitudes towards excessive consumption,China’s personal credit business has been rapid development.However,credit default events occur frequently,posing a serious challenge to risk control ability of China’s financial industry.Therefore,it is necessary for banks and other financial institutions to evaluate the credit risk of individuals and make correct lending decisions.Based on the historical real data of a commercial bank,this paper forecasts individual credit default situation.Before the empirical analysis,missing value filling and equalization treatment have been finished.Lasso,MV index and chi-square distance,stepwise regression,Ⅳ value and random forest were then used for feature selection.KNN,Logistic regression,random forest,LightGBM and neural network were established based on feature selection.Finally,Stacking integration uses the models with poor performance on a single classifier to predict results.From the analysis,the model performs best on a single classifier which can perfectly distinguish between default and non-default users by using the combination of Lasso,SMOTE sampling and LightGBM.The performance of MV index and chi-square distance,Ⅳ value and random forest on a single classifier is worse than Lasso.Stacking integration has not significantly improved the classified evaluation index,but it has still played a role in identifying default users,which can be used as another way to predict personal credit default.The main innovations of this paper are as follows.Firstly,equalization treatment has been finished before the analysis.Besides,taking the characteristics of variable into full consideration,KNN and random forest are carried out for categorical variable and numerical variable respectively.Further more,calculate the Ⅳ value for preliminary filtering and use embedded random forest for further filtering,in order to reduce the redundancy of data and avoid the occurrence of overfitting.Through the analysis,the model we established can accurately identify default users and make financial institutions more forward-looking in such business operations. |