Font Size: a A A

Network Credit Default Prediction Based On Ensemble Learning

Posted on:2024-01-11Degree:MasterType:Thesis
Country:ChinaCandidate:M J XuFull Text:PDF
GTID:2568307106986099Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
Personal credit is one of the main businesses of financial institutions.However,due to different economic conditions and consumption concepts,the risk of personal credit default is also high.For financial institutions,personal credit default can have a serious impact on their economic interests.According to the statistics of the CBRC,at the end of 2022,the balance of non-performing loans of banking financial institutions was 3.8 trillion yuan,an increase of 169.9 billion yuan compared to the beginning of the year.Therefore,how to accurately predict customer defaults and avoid the increase in non-performing loans has become a very important research direction,which is of great significance for financial institutions to conduct risk management and decision-making.In this study,taking the desensitized and simulated online credit dataset provided by Zhongyuan Bank as the research object,the research background and purpose of personal credit default as well as relevant theoretical and algorithmic foundations were first described.Then,descriptive statistics were conducted on the data,exploratory data analysis was conducted to analyze the relationship between feature variables and labels,and data cleaning was conducted,including text feature conversion and coding,as well as missing values Processing abnormal values.After constructing derived features,use the filtering method and embedding method to select features.After filtering,five features are deleted,namely,f0,f5,and pub_dero_bankrup、sub_class、house_Exist greatly improves the performance of the model,and then uses six methods,including random oversampling,SMOTE sampling,ADASYN oversampling,and SOMTE-Tomlink comprehensive sampling,to select the sampling method with the highest AUC value under the Light GBM model,SOMTE-Tomlink comprehensive sampling,to process unbalanced data.After that,different machine learning algorithms were used for modeling,including the random forest based on bagging thinking,XGBoost based on boosting thinking,Cat Boost,and Light GBM algorithms to predict whether network customers’ credit defaults or not.During model training,network search methods were used to adjust the parameters,and the performance of different combinations of parameters was evaluated using a 50 fold cross validation,Select the best performing combination of hyperparameters as the final model’s hyperparameter settings.In addition,weighted Voting and Stacking algorithms are also used to integrate multiple models to improve model performance.In the weighted Voting algorithm in this article,four models,namely,random forest,XGBoost,Cat Boost,and Light GBM,are used as the base models to conduct weighted soft voting on the results obtained by the model.In the Stacking algorithm,four base models,namely random forest,Cat Boost,Light GBM,and XGBoost,are used as the first layer models for prediction,and the new features generated by them are used as the input of the second layer.The final output of the prediction results is used with accuracy,precision Recall rate,F1 value,KS value,and AUC value are used as model evaluation criteria to compare and evaluate the above six models.By comparing the experimental results of different models on the dataset,it is found that XGBoost performs best in the recall rate indicator,Light GBM performs best in accuracy and precision,Cat Boost performs evenly in the three models on each indicator,and the evaluation index values under the Voting model are between those under the first four base models,and the core indicators AUC and KS values under the Stcking model are the highest among all models.It shows that the Stacking model relatively integrates the advantages of four algorithms,which further increases the other performance of the model while ensuring accuracy.Therefore,the Stacking model is selected as the final default risk prediction model to predict whether the borrower will default.This research provides a feasible method for banks and financial institutions to accurately predict customer defaults,thereby reducing credit risk.
Keywords/Search Tags:Credit default, Integration algorithm, Stacking
PDF Full Text Request
Related items