| When constructing the personal credit default prediction model,the category imbalance problem of the training set will often greatly bias the prediction results of the classifier.It is that most users are identified as normal users,while the really important default users are not effectively identified.Therefore,this paper explores the personal credit default prediction model combining category balancing strategy and integrated learning on the basis of the traditional credit risk control research.First of all,the main direction and basic ideas of credit problem research are introduced in recent years,and the feasibility of combining class balancing strategy and integrated learning to improve the identification ability of minority default users is introduced.Secondly,this paper sorts out the current common strategies for processing unbalanced data sets,and focuses on analyzing the principle and implementation process of SMOTE algorithm.As an effective oversampling technique,the SMOTE algorithm can effectively avoid the data over-fitting problem,and greatly improve the ability of the model to identify a few samples.At the same time,according to the shortcomings of SMOTE algorithm in synthetic new samples,this paper highlights three improved SMOTE equilibrium strategies: oversampling strategy based on Kmeans-SMOTE algorithm,oversampling strategy based on Light GBM-SMOTE algorithm and weighted random sampling algorithm.The three methods respectively improve the disadvantages of the SMOTE algorithm from the perspective of sample specificity and sample distribution.Then,the personal credit default forecast model is built based on the data of a loan platform.In the stage of data preprocessing,it is mainly carried out from feature engineering,outlier processing,correlation analysis and index importance sorting.The combined model of personal credit default prediction is constructed between the processed data set and three classic single-classifier algorithms and two integrated base classification algorithms.Compared with the evaluation indexes of each model,the conclusion is as follows:(1)The application of sample equalization strategy can effectively improve the classifier effect of the model;(2)Improved category balancing strategy can improve the identification ability of the model on a few samples better than the original SMOTE equilibrium strategy;(3)Kmeans-SMOTE algorithm and weighted random sampling strategy has the best class classification effect.(4)Finally,based on the optimal classification prediction model,the contribution of each variable to the prediction results is ranked from large to small,and several factors have a great impact on the default risk: average income,average salary,education level,repayment amount of the previous period,etc.In conclusion,the suggestions are as follows: when constructing a credit default forecast model,it is necessary to balance unbalanced data with multiple balance strategies;during credit audit,focus on the lender’s wage income,education and repayment behavior,borrowers with incomplete information are usually more likely to default than other users,it is recommended to filter in advance. |