The development of artificial intelligence and financial technology has had a disruptive impact on financial products and business models,bringing convenience to people and creating a large number of credit risk problems.Therefore,how to efficiently and accurately assess and predict the risk of credit default has become an urgent problem for banks and other financial institutions to solve.By aggregating massive information such as the basic attributes of borrowers and lending-related information,and using machine learning algorithms to establish risk control models,it is one of the mainstream directions of current research in the field of financial risk control.Based on the serial fusion strategy and stacking model fusion idea,this paper proposes a deep heterogeneous stacking model fusion method,which uses multiple integrated models that can automatically combine features,acts as a feature generator to construct new features,and merges the new features with the original training set,continues to train multiple integration models at the next level,achieves multi-layer fusion by continuously updating the training set,and finally logistic regression is used as a prediction model to output the results.In this paper,17 molded features are determined by data cleaning,missing value filling,binning coding,and feature selection by using the financial risk control competition dataset jointly initiated by Datawhale and Tianchi.Then,using Random Forest,Light GBM,and XGBoost as base learners at each level,three deep heterogeneous stacking credit default detection models were constructed,which was significantly improved compared with the base learner and other model fusion methods.In particular,the performance of the 2-layer deep heterogeneous stacking model is the best,and compared with the Light GBM with the best performance in the base learner,the evaluation indicators KS,AUC,and F1 are increased by 3%,1.4%,and 1.6%,respectively.In addition,this article further verifies the effectiveness of the method using the"Give Me Some Credit"credit contest dataset on Kaggle.The deep heterogeneous stacking model fusion method proposed in this paper can synthesize the advantages of multi-class algorithms,and has good performance in terms of robustness and sample differentiation ability,and has high universality and application value in predicting credit default risk. |