User Credit Default Prediction Based On Feature Selection

Posted on:2022-07-23

Degree:Master

Type:Thesis

Country:China

Candidate:H Wan

Full Text:PDF

GTID:2518306722981919

Subject:Applied Statistics

Abstract/Summary:

In recent years,due to the rapid development of China’s economy,the significant improvement of residents’living standards and the shift in people’s consumption attitudes towards excessive consumption,China’s personal credit business has been rapid development.However,credit default events occur frequently,posing a serious challenge to risk control ability of China’s financial industry.Therefore,it is necessary for banks and other financial institutions to evaluate the credit risk of individuals and make correct lending decisions.Based on the historical real data of a commercial bank,this paper forecasts individual credit default situation.Before the empirical analysis,missing value filling and equalization treatment have been finished.Lasso,MV index and chi-square distance,stepwise regression,Ⅳ value and random forest were then used for feature selection.KNN,Logistic regression,random forest,LightGBM and neural network were established based on feature selection.Finally,Stacking integration uses the models with poor performance on a single classifier to predict results.From the analysis,the model performs best on a single classifier which can perfectly distinguish between default and non-default users by using the combination of Lasso,SMOTE sampling and LightGBM.The performance of MV index and chi-square distance,Ⅳ value and random forest on a single classifier is worse than Lasso.Stacking integration has not significantly improved the classified evaluation index,but it has still played a role in identifying default users,which can be used as another way to predict personal credit default.The main innovations of this paper are as follows.Firstly,equalization treatment has been finished before the analysis.Besides,taking the characteristics of variable into full consideration,KNN and random forest are carried out for categorical variable and numerical variable respectively.Further more,calculate the Ⅳ value for preliminary filtering and use embedded random forest for further filtering,in order to reduce the redundancy of data and avoid the occurrence of overfitting.Through the analysis,the model we established can accurately identify default users and make financial institutions more forward-looking in such business operations.

Keywords/Search Tags:

credit default, feature selection, unbalanced data, Stacking integration

Related items

1	Network Credit Default Prediction Based On Ensemble Learning
2	P2P Online Loan Default Risk Early Warning Research
3	Personal Credit Evaluation Method Based On Integration Of XGBoost And LR
4	Design And Analysis Of Personal Credit Data Default Mining Model
5	Credit Default Detection Based On Deep Heterogeneous Stacking Model
6	Research On Feature Selection Method In Bank Credit Card Default Prediction
7	Analysis And Research On Unbalanced Data Of Credit Score Based On Stacking Integrated Algorithm
8	Research On High-dimensional Unbalanced Data Classification Algorithm Based On Feature Selection And Ensemble Learning
9	Personal Credit Risk Assessment Under Unbalanced Data Sets
10	Research On Credit Evaluation Method Based On Mixed Sampling And Stacking Integration