Font Size: a A A

Research On Credit Scoring Method For Unbalanced Data

Posted on:2022-05-20Degree:MasterType:Thesis
Country:ChinaCandidate:K NiuFull Text:PDF
GTID:2518306731977699Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Most participants in online lending industry are low and middle income people who have difficulty getting loans from traditional financial institutions such as banks.It is a new option for low-and middle-income people to borrow and invest.However,the hidden dangers left by the rapid development of the online industry in recent years,such as imperfect regulatory measures and imperfect credit scoring system,have also broken out rapidly,causing a great impact on the development of this industr y.Therefore,the construction of an effective credit scoring model is the key to solve the current crisis of online lending industry.Currently,there are two main problems in constructing an effective credit scoring model.First,the problem of class imbalance of credit data.The class imbalance will cause the prediction results of the model to be biased towards one class.However,in the profit-oriented online lending industry,the deviation of the model prediction results will cause economic losses to the participants.Second,Difficulty in updating models online trained by multiple types of data.Online lending platforms have a large number of users and very frequent transactions,which will lead to rapid changes in data distribution.If the model is not updated in time,the prediction results of the model will be inaccurate.For problem one,this paper proposes a novel resampling ensemble model based on data distribution for imbalanced credit risk evaluation in online lending.The model is composed of two parts: an undersampling method based on the distribution of majority class data and Bagging algorithm.The undersampling method is to obtain majority class sample distribution by clustering,and then undersampling majority class according to its distribution.The undersampling method can solve the class imbalance problem and reduce the loss of majority class information,so that which can improve the ability of classifier to identify minority class samples and maintain a good ability to identify majority class samples.In addition,the combination of this undersampling method and Bagging algorithm not only further reduces the information loss of majority class,but also improves the stability of the model.This paper compares the performance of our proposed model with several baseline models on three credit datasets.Experimental results show that the proposed model has higher AUC and G-mean values,and better classification performance.For problem two,This paper proposes a online integrated credit scoring model.The model is a linear combination of two neural network components,one for processing sparse categorical features and the other for processing dense numerical features.The latter obtains the ability to process dense numerical features by knowledge distillation of the trained Gradient Boosting Decision Tree model which can process dense numerical data well.After linear combination of the two components,the model can not only deal with sparse categorical features and dense numerical features simultaneously,but also update online.Experimental results on two credit data sets with time stamps show that the problem of model performance degradation caused by delayed model update does exist,and the proposed model achieves higher AUC values in both offline and online experiments,which can better solve the problem of model update.
Keywords/Search Tags:Online Lending, Credit Scoring, Unbalanced Data, Neural Network, Model Optimization
PDF Full Text Request
Related items