| In bank loan services,the user’s default is property damage for the bank.There is a relationship between the user’s relevant indicators and whether a default will occur.For commercial banks and other financial institutions,failure to repay principal and interest within a specified period will increase the risk of liquidity for these commercial financial institutions and will also have a negative impact on the stability of the financial markets.Therefore,a good model for predicting the risk of loan default can help banks and financial institutions in their investment decisions and strengthen their risk control capabilities.To address the loan default problem,this thesis optimizes a single model and proposes a loan default prediction model based on model fusion,based on the dataset of the Tenchi Loan Competition,the data level and at the model level,respectively,with the following main research elements.First,the LOF+SMOTETomek Links hybrid sampling model was built for unbalanced loan data: for the problem of unbalanced data samples in the experimental data and the impact of noise points on the prediction model,the hybrid sampling model was built to reduce the difference between positive and negative samples,balance the data set and lay the foundation for the subsequent model training.Secondly,the FL-XGBoost and FL-Light GBM default prediction models are constructed: based on a mixed sampling of the data,the XGBoost and Light GBM models are improved at the model level by using the Focal Loss loss function to build the FL-XGBoost and FL-Light GBM models.Through manual setting selection,the common values of the two sample weight coefficients and the common step values are arranged and combined to determine the values of the different weight coefficients in the loss function,so that the model effect is improved.Third,through the grid search parameter adjustment method,the single-parameter optimization method is adopted to find the optimal parameters,determine the optimal parameter combination of the built model,and apply it to the model.Finally,by comparing the fusion effects of the three ensemble model fusion methods of Stacking,Voting and Blending on the improved model,the FL-XGBoost,FL-Light GBM and logistic regression models were fused by Stacking and Voting soft voting methods,and three sets of comparative experiments were set up,and the experimental results were compared and analyzed.Through experimental comparative analysis,the AUC value of the model is 0.9349,which finally proves the effectiveness of the loan default prediction model based on model fusion based on the improvement of data level and model level.By applying the loan default prediction model studied above to predict user default,the prediction accuracy can be improved,thereby reducing the risk of loss.In the context of the popularity of loans,it has a certain role in promoting the healthy development of the financial market. |