Analysis And Research On Unbalanced Data Of Credit Score Based On Stacking Integrated Algorithm

Posted on:2021-05-04

Degree:Master

Type:Thesis

Country:China

Candidate:S Li

Full Text:PDF

GTID:2428330611970414

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

In recent years,with the continuous expansion of China's personal consumer credit business,risk control in the financial sector has become crucial.Banks use credit scoring systems to evaluate and predict borrowers' repayment ability and personal credit.However,the number of customers who are overdue after the loan is a minority,that is,the data set used by the bank to establish a credit score model is not overdue(positive sample)is much larger than the overdue sample(negative sample),such a data set is called an unbalanced data set.The results obtained by banks using a credit scoring model built with unbalanced data sets will be biased towards the majority category samples(unexpired customers),that is,it is easy to classify the minority category samples incorrectly,and the recognition rate of minority category samples(overdue customers)Lower.Aiming at the above unbalanced data set problem,the SMOTE oversampling algorithm is used to balance the data set.The SMOTE oversampling algorithm generates new minority class samples based on the minority class samples(overdue customer samples)in the dataset.The newly generated samples may blur the classification boundaries of the positive and negative samples and reduce the classification effect of the model.For the problem that the new sample will blur the classification boundary,the MODIFIED-SMOTE oversampling algorithm is proposed.First,the 15% of the data in the minority class sample that is closest to the classification boundary is removed,and then a new For minority class samples,each time a new sample is generated,the KNN algorithm is used to determine whether the newly generated sample belongs to the minority class,and the sample belonging to the minority class is retained,otherwise the newly generated sample is discarded.In this way,it is more effective to avoid the fuzzy classification boundary of the newly generated samples and the generation of erroneous samples.From the perspective of the model,this paper proposes the SLRA-Stacking(MODIFIED-SMOTE Logistic Random Forest Adaboost Stacking)model suitable for credit scoring,SLRA-Stacking model is a combination of MODIFIED-SMOTE oversampling algorithm and Stacking integrated algorithm,which can be more suitable for the unbalanced characteristics of credit score data set;Secondly,from the perspective of improving the performance of the integrated model,comprehensively consider the advantages and disadvantages of each single model,realize the diversity of the base classifier by combining different classification models,and combine the probability of model prediction with the original modeling attribute variables for secondary learning to achieve more Strong generalization ability.In this paper,five model training data sets are selected,namely Logistic,Random Forest,Adaboost,Stacking and SLRA-Stacking models,and the conclusion is drawn by comparing the effects of each model before and after the data set balance processing: Models trained with unbalanced data sets are less effective in classifying overdue customers than models trained on balanced data sets,and trained on balanced data sets The test effect of SLRA-Stacking in the model is better than other models,the model is stable,and the generalization ability is strong.Therefore,SLRA-Stacking can meet the needs of banks' personal credit scores and has certain practical value.

Keywords/Search Tags:

Personal credit score, MODIFIED-SMOTE, Adaboost, SLRA-Stacking

PDF Full Text Request

Related items

1	Research On The Application Of Boosting Algorithm Based On Improved SMOTE In Personal Credit Evaluation
2	Research And Application Of Personal Credit Automatic Evaluation Method
3	Research On Commercial Bank Personal Credit Scoring Model Based On Data Mining Technology
4	Personal Credit Score Modeling And Analysis Based On Data Mining
5	Optimization Of P2P Personal Credit Risk Assessment Model Based On Data Mining Technology
6	Identification Of Personal Credit Risk Based On Stacking Fusion Model
7	SVM And Application Study In Personal Credit Scoring
8	Study On The Personal Credit Evaluation Based On The Modified Na(?)ve Bayesian Method
9	Design And Implementation Of Personal Credit Reporting System Based On Blockchain
10	Research On Credit Evaluation Method Based On Mixed Sampling And Stacking Integration