Personal credit overdue prediction is one of the key problems in controlling financial risks.Traditional machine learning methods build classification models based on credit users’ loan characteristics,while integrated learning can prevent model overfitting while ensuring high accuracy,and is the mainstream of current applied research.Compared with homogeneous models,heterogeneous models(Stacking methods)can fully combine the advantages of base models to improve model performance.The base model and hyperparameter selection have a large impact on the prediction accuracy,so how to combine the domain knowledge and adaptive parameter selection based on the sample is the core issue of application.In this thesis,we use Stacking method to fuse multiple single models for credit user overdue prediction,and perform parameter optimization on the parameters of the base model and select appropriate parameter combinations to improve the model performance.The main innovations and work are as follows.(1)Data set selection and pre-processing.Combining the research results of domain experts,it is found that age,family situation,historical overdue records,historical transactions and life consumption are the key influencing factors for credit overdue prediction.The desensitized data of Union Pay business is selected as the dataset,and the missing value processing,imbalance processing and feature dimensionality reduction are performed for the problems of redundant features,missing data and high dimensionality in the dataset.(2)Constructing XRG-Stacking overdue prediction model.In order to improve the accuracy of personal credit overdue prediction model,XRG-Stacking overdue prediction model is constructed in this thesis by using XGBoost,random forest and GBDT homogeneous models in the heterogeneous integrated model.On the CUP data set,it is found that XRG-Stacking is1.7%,1.3%,0.9%,2.1%,and 1% higher in accuracy,precision,recall,F1-score,and AUC,respectively,compared with the traditional classification algorithm and base classifier algorithm than the best XGBoost model.(3)The IMPBO-XRG-Stacking model is proposed and constructed.To address the sensitivity problem of the base model parameters in the XRG-Stacking model,a Bayesian optimization framework is used to optimize the parameters of the base model.To address the problem that the Bayesian optimization algorithm will fall into local optimum,a dynamic adaptive balancing factor is designed to improve the acquisition function.Experiments show that the improved Bayesian optimization algorithm works best compared to Bayesian optimization algorithm,grid search,random search,simulated annealing,and genetic algorithm.Comparison with the optimized single model reveals that the IMPBO-XRG-Stacking model has 1.5%,1.6%,1.8%,1.0%,and 0.6% improvement in accuracy,precision,recall,F1-score,and AUC,respectively.The IMPBO-XRG-Stacking model has good prediction results on both customer loan overdue record dataset and CUP dataset in Taiwan.(4)Design and implement a prototype system for credit overdue prediction.According to the software engineering development model,the system requirement analysis was first conducted,the Django framework was used to build the system,the functions of user management module,customer information display module and data prediction module were implemented,and a prototype credit overdue prediction system was constructed based on the proposed algorithm. |