Font Size: a A A

Research On Ensemble Credit Scoring Model For Imbalanced Data

Posted on:2021-05-29Degree:MasterType:Thesis
Country:ChinaCandidate:S Q LiFull Text:PDF
GTID:2428330611957105Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the booming development of Internet finance,the credit risk management is becoming more challenging.As the core content of credit risk management,the credit scoring model aims to distinguish potential delinquent lenders and avoid financial risks.It plays an important role in the financial industry.However,the highly imbalanced credit customer data also affects the performance of credit scoring model.Currently,how to deal with imbalanced data effectively and improve the performance of credit scoring model is a hot theme.The dissertation focuses on the highly imbalanced characteristics of the credit customer data,the original data is equalized,on the basis,the ensemble credit scoring model is established.The main research contents and innovations are as follows:(1)This dissertation proposes an imbalanced data expansion method based on a generative adversarial networks for imbalanced data.This method learns the original minority samples through a generative adversarial networks at first,and generates samples that match the original data distribution.Then,the generated samples are filtered based on Euclidean distance,so that the generated samples are more distributed on the sample boundary,which enrich boundary sample information.Finally,the original data is combined with the generated minority class samples to obtain a balanced sample set.Experimental results show that the method can effectively expand the data for different imbalanced data sets,and can achieve good results on ten base classifiers.(2)Based on the equalized data,an Extreme Gradient Boosting(XGBoost)ensemble credit scoring model based on deep neural networks(DNN)is proposed.In the proposed model,the bagging sampling method is first used to divide the training set into a variable training subset,and then the feature extractor is trained based on the deep neural networks to output the data of the specified hidden layer as the input of XGBoost to construct Base classifier.Finally,the output of different base classifiers is combined to generate the final sample prediction label by a simple probability average.In order to verify the performance of this model,three open credit datasets in the UCI machine learning library were used to evaluate the proposed model,and the impact of the number of hidden layers of DNN on the performance of the model was evaluated.The results show that the accuracy of this model is significantly improved compared with the basic classifiers,ensemble classifiers and variant models.
Keywords/Search Tags:Credit scoring, Imbalanced data processing, Ensemble learning
PDF Full Text Request
Related items