Font Size: a A A

Forecasting Loan Default Based On Random Forest Model Fusion

Posted on:2022-07-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y J HuangFull Text:PDF
GTID:2517306527952299Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
Internet finance has brought great convenience to people's life,which not only facilitates credit card loans,but also brings the risk of loan default.Therefore,understanding the behavior characteristics of defaulting users is the key to reduce the financial risks of credit platforms.In this paper,a binary classification model is built to judge whether users will default on loans.The specific work results are summarized as follows:(1)Feature engineering and exploratory analysis.In this paper,the original data is preprocessed,missing values are filled and outliers are deleted,so as to obtain a complete data set.Next,feature engineering is constructed,and 10 variables with practical significance are artificially derived.Then,the exploratory analysis of variables is conducted to intuitively perceive the correlation between independent variables and dependent variables,providing a basis for the construction of the model.(2)Treatment of unbalanced samples.The proportion of the two kinds of samples in the training set is unbalanced.In order to prevent the misleading of the unbalanced samples to the prediction results,undersampling and SMOTE sampling are carried out on the training samples respectively.The experimental results show that the model prediction effect of the balanced data set based on SMOTE sampling method is better.(3)Construction of fusion model.In order to improve the shortcomings of single model prediction,this paper takes xgboost,catboost and GBDT models as the base model,and constructs the fusion model Stacking based on the meta-model as the random forest.Firstly,the three base classifiers are trained on the balanced data set,and the predicted probabilities of the dependent variables are obtained,so as to form a new training set.The metamodel is built on the new training set,and the final prediction result is obtained.The experimental results also show that the prediction of the fusion model is better.The AUC value of the model is 0.7252,and the accuracy rate is 80.50%.
Keywords/Search Tags:Loan default prediction, Unbalanced sample, xgboost, catboost, GBDT, Random forest, Fusion model Stacking
PDF Full Text Request
Related items