Font Size: a A A

Personal Credit Scoring Based On Imbalanced Ensemble Classification

Posted on:2021-03-07Degree:MasterType:Thesis
Country:ChinaCandidate:C WangFull Text:PDF
GTID:2428330629488911Subject:Engineering
Abstract/Summary:PDF Full Text Request
In the field of personal credit scoring,imbalanced sample classification is an issue to be addressed.The number of good clients is usually much more than that of bad ones.In such an occasion,the effect of traditional machine learning classification algorithm will be greatly affected because the classification results will be more biased to the majority class samples.Aiming at the problem of imbalanced sample classification in personal credit scoring,this thesis intends to explore data distribution adjustment and classifier construction on imbalanced samples.(1)A clustering oversampling algorithm based on noise filtering(CF-KM-SMOTE)is proposed.The algorithm removes the noise samples in the original imbalanced sample set.After clustering,SMOTE oversampling is performed on the minority samples in each cluster.CF-KM-SMOTE can control the newly generated minority samples within the minority interval.So,it can effectively solve the problem of fuzzy boundaries that may exist in SMOTE oversampling.(2)An ensemble learning approach to personal credit scoring based on CF-KM-SMOTE is proposed.This approach uses CF-KM-SMOTE to generate new bad client samples during the iteration of ensemble learning.Therefore,it can avoid the classification impact of imbalanced samples on personal credit scoring through dynamically changing the sample weights during the iterative process of Boosting integration.The weight of bad client samples that are easy to be misclassified due to the small sample size is increased,which can effectively solve the problem of sample imbalance in personal credit scoring.Experiments on UCI sample sets with multiple imbalance ratios show that the oversampling algorithm and ensemble learning method of personal credit scoring proposed in this thesis achieve better ROC curve and higher AUC,F1-Measure and G-mean.The approach proposed shows certain advantages for the identification of bad clients.
Keywords/Search Tags:Personal Credit Scoring, Imbalanced Sample Set, Classification, Oversampling, Ensemble Learning
PDF Full Text Request
Related items