Font Size: a A A

Research On Personal Loan Default Prediction Based On Stacking Ensemble Learning

Posted on:2024-06-08Degree:MasterType:Thesis
Country:ChinaCandidate:H B YangFull Text:PDF
GTID:2530307145954529Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
In today’s fast-moving era,risk management plays a vital role in the credit industry.The difficulty of regulation and management is also increasing due to different business scenarios.While current risk management techniques have made significant progress,there are still some challenges,such as slow approval processes,high labor consumption,and insufficient customer certification.Therefore,constructing an accurate and reliable loan default prediction model will provide strong support for the long-term stable development of China’s credit industry.This paper takes the public dataset of Data Fountain’s official competition platform as an example to construct a loan default prediction model.Data preprocessing,data binning,feature coding and feature selection were performed on the personal loan default dataset.Using the processed data,the dataset is divided into training set and test set according to the ratio of 4:1,and three sampling methods are used to deal with the data imbalance problem: SMOTE oversampling,Random over sampling random oversampling and SMOTETomek comprehensive sampling,and the five-fold cross-validation training model is used in the modeling process,and Ada Boost,GBDT,XGBoost,and Cat Boost and Light GBM built loan default prediction models,and evaluated and compared each model on the test set.It is found that after processing the imbalance data using Random over sampling,the AUC of the Light GBM loan default prediction model is the highest,which is 0.8797.In order to effectively improve the prediction effect of the loan default prediction model and break through the limitations of a single model,the Stacking algorithm is used for model fusion,and the Random over sampling data set and the optimized model are also used,Light GBM is selected as the metamodel,and XGBoost,GBDT and Ada Boost models are used as the base models to build Light GBM_Stacking two-tier fusion loan default prediction model.In order to compare the single model and the fusion model,AUC is used as the evaluation index.The results show that the AUC value of the fusion model reaches the maximum,which is 0.8818.Light GBM_Stacking loan default prediction model is the optimal model in this paper.
Keywords/Search Tags:personal loans, risk of default, LightGBM, Stacking fusion model
PDF Full Text Request
Related items