| In the post-pandemic era when the recovery of resident consumption is weak and the growth center is slower than that before the pandemic,in order to expand domestic demand and consumption,the central government and credit institutions have made active efforts to boost the growth of consumer credit,and significant results have been achieved so far.However,for credit institutions,the growth of the total amount and the number of consumer credit will also bring corresponding risks,including market risk,credit risk,liquidity risk and operational risk,among which credit risk is the most concerned risk and the biggest threat to the credit business.Early credit risk prediction mainly relies on manual audit,which is time-consuming,complicated and costly.With the rapid development of personal loan business,these credit risk prediction methods can no longer meet the needs of credit institutions.In order to make credit risk prediction intelligent,efficient and accurate,major credit institutions,individuals and scholars at home and abroad have continuously tried to apply machine learning,deep learning and other methods to credit default prediction,but there are still problems such as low accuracy and efficiency of credit risk assessment.This may be due to incomplete mining of valid information in the data,or inappropriate model algorithms,etc.Therefore,it is urgent to further explore the hidden information in personal loan data,further explore how to improve the ability to identify loan default,and establish a more efficient,intelligent and accurate credit risk prediction model.Based on the historical information observed and collected up to the time when borrowers apply for loans,as well as the process observable information,this thesis proposes a post-loan credit risk prediction model based on Blending algorithm,and applies it to the individual independent loan data on Lending Club platform from 2017 to 2020.For this data set,empirical analysis is mainly carried out from the following aspects.Firstly,the data set is analyzed visually and preprocessed,including data cleaning work such as data filtering and missing value processing.Feature derivation,feature abstraction,XMB feature scaling and feature selection are used to process features.In feature derivation,appropriate procedural observable variables need to be selected.The random forest in embedding method and Pearson correlation coefficient in filtering method are used for feature selection.Visual analysis;balance the data set by Boderline-SMOTE oversampling.Secondly,a post-loan credit risk prediction model based on the Blending model is constructed for the data sets containing process observable variables.At the first layer of the Blending model,strong learner with higher precision rate are used,such as Random Forest,Light GBM and Cat Boost,while the second metamodel uses logistic regression.Finally,in order to verify the validity of the constructed model,the pre-loan credit risk prediction models based on Random Forest,Light GBM and Cat Boost are constructed using data sets that do not contain process observable variables,and the classification effect is evaluated according to the accuracy and AUC values of each model on the test set.The empirical results show that,among the four models,the accuracy rate and AUC value of the credit risk prediction model based on Blending algorithm are the highest,which are 86.55% and 0.9352,respectively,which are 1.92% and 2.15% higher than those of the other three models on average.Therefore,compared with the traditional prelending credit risk prediction model based on a single machine learning algorithm,the credit risk prediction model in lending process based on Blending algorithm can improve the accuracy of loan default prediction to some extent,and provides a train of thought for lending institutions to evaluate whether clients will default in the post-loan process.It is helpful for credit institutions to take relevant measures to customers with high risk of default in time,so as to reduce direct economic losses and subsequent collection costs caused by customers’ default. |