At present,the scientific and accurate approval of personal credit is the main focus of major banks and related financial institutions,which is related to the final loan recovery.Therefore,major financial lending institutions must also be required to obtain a reliable indicator system on personal loan defaults,and establish a scientific and accurate loan risk prediction model that can mine users’ potential information and automatically identify users’ lending behavior.The classic and reliable loan default prediction model is a classification model based on XGBoost.The feature screening method of this model generally uses feature engineering and IV value(the predictive ability of feature variables)to screen feature indicators,but the calculation method of the IV value itself is linear.The calculation method,and XGBoost is a nonlinear model,the variables obtained by screening based on the IV value are not in line with the XGBoost model.In this paper,the Gibbs Sampling method under the MCMC framework will be used to screen and extract the features that affect the personal loan situation,and XGBoost will be used as a screening tool to randomly search and extract the associated feature factors that affect personal loans.The variables obtained by screening are more in line with expectations.Construction of the XGBoost model.Compared with the traditional feature screening method,the similarities and differences between the two feature systems are analyzed,and the classic XGBoost model is built based on the two index systems.The performance metrics outperform XGBoost based on IV value screening features.And during the construction process,it was found that the machine learning model is not interpretable,so the SHAP interpretation method is added to explain in detail how each indicator affects the classification and prediction results of the XGBoost model. |