| The lending business has shown a rapid expansion trend with the diversification of lending demand in financial institutions.However,the national economy has been in a downturn in recent years,especially affected by this year’s epidemic,the unemployment rate continues to rise,leading to the possibility of borrowers defaulting greatly.When a large of customers default,both commercial Banks and lending companies will suffer heavy losses.Therefore,it is important to establish individual loan default warning model for controlling the risk and its long-term development of financial institutions.This paper will user mainstream data mining technology to establish individual loan default warning model,so as to timely predict potential loan defaulters,and help financial institutions reduce risks and economic losses.In this paper,the research results of domestic and foreign scholars are collected,sorted out and evaluated first,and the basic ideas and characteristics of several types of algorithms used are expounded.Then,using the loaner data collected from a domestic financial institution,the relevant variables are selected from the five factors,namely,credit history,interpersonal communication,identity characteristics,behavioral preferences and performance ability.The data are cleaned and the IV value of each variable is calculated,and the variables with IV values greater than 0.02 are selected as the basis for subsequent modeling.Then,through the method of clustering and stratified sampling of normal lenders,the samples of normal lenders are matched with the samples of default lenders,so as to establish 24 balanced data sets and prepare for modeling.Next,seven types of individual loan default warning models are built in each balanced data set,including neural network,support vector machine,logistic regression,decision tree,GBDT iterative decision tree,random forest,and XGBoost.Moreover,seven evaluation indexes including accuracy,accuracy,recall rate,AUC,KS,F1 value and GMean value are selected to evaluate and compare the prediction effect of the model.Furthermore,16 optimal models are obtained through the principal component comprehensive evaluation method,thus forming a principal component combination model by weighted linear combination.At last,the prediction results are compared and analyzed with those by seven kinds of single models.The result shows that,the combination model of principal component default identification established in this paper has advantages in the identification of personal loan default risk than neural networks,support vector machines,logistic regression,decision trees,GBDT iterative decision trees,random forests and XGBoost models.From the perspective of the prediction effect on the test set data,the principal component default identification combination model can not only identify a considerable proportion of default lenders,but also avoid excessive misjudgment of normal lenders.Therefore,the research in this paper can effectively assist the risk control of financial institutions and reduce their unnecessary economic losses. |