Font Size: a A A

The Construction And Visual Expression Of Personal Credit Default Prediction Model Of Commercial Bank

Posted on:2022-05-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y T TuFull Text:PDF
GTID:2517306722481924Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
The advent of the big data era provides banks with multiple sources of personal data,enriching personal credit profiles,especially commercial banks gradually accumulating big data resources,how to fully integrate and use bank big data to conduct personal credit risk assessment more comprehensively,is one of the problems facing commercial banks.Based on bank customer data,this paper identifies untrustworthy users by establishing a user credit default prediction model,and digs out the key factors that affect credit default,which is of great significance for banks to reduce the economic losses caused by defaulting customers.This article first constructs features related to default or not based on existing variables,and deals with outliers and missing values of features.Logarithmic transformation and normalization are performed on numerical variables,and coding is performed on categorical variables.Then compare and analyze the pros and cons of the four feature selection methods: variance,mutual information,embedded,and recursive,to find personalized feature selection schemes for different prediction models.Subsequently,the Logistic regression model,Random Forest,XGBoost and LightGBM of 58 original features under 3 sampling methods were established,and on this basis,the best feature selection scheme corresponding to each model was used for feature screening under the 3 sampling methods to screen the model.Carry out further optimization.After comparative analysis,the optimal model in this paper is the LightGBM model after oversampling and feature screening.Its recall rate on the test set reached 60.9%,F1 reached 53.92%,and AUC reached 75.68%.This shows that the model not only has a good overall classification ability,but also performs well in identifying default users.Finally,use the interpretation framework SHAP of the integrated model to visualize the value of each variable in the best-performing LightGBM model,which makes up for the problem of insufficient interpretation capabilities of complex models for important variables.
Keywords/Search Tags:Credit default prediction, unbalanced data, LightGBM, Sharpley Interpretation Framework
PDF Full Text Request
Related items