Font Size: a A A

Research On Unbalanced Classification In Bank Credit Scoring

Posted on:2018-09-14Degree:MasterType:Thesis
Country:ChinaCandidate:L ChenFull Text:PDF
GTID:2348330536470708Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Credit scoring is an important part of bank credit risk management.It is a way for banks to evaluate the customer's credit status,return the loan ability and future prospects.It is a process of guiding the business through mining customer information.In the context of the current big data age,banks can obtain more and more customer credit data,how to tap the data hidden information to determine the customer credit scoring is a critical issue for the bank.In the actual bank credit data set,the credit good customers are often much more than the credit bad customers,which leads to the bank credit scoring problem is essentially an unbalanced classification problem.In the case of unbalanced classification problem,small class samples are often the focus of attention,such as credit scoring areas,banks are more concerned about those bad credit customers.Therefore,how to effectively distinguish and identify small class samples is the key to solve the unbalanced classification problem.Machine learning algorithms often can't effectively identify small class samples when dealing with unbalanced classification problem,so how to effectively solve the unbalanced classification problem is the focus of research work.At present,unbalanced classification problem is mainly studied from the data level and the algorithm level.At the data level,the resampling method is used to balance the distribution of data categories,such as random oversampling method,ROSE method and SMOTE method;at the algorithm level,the ensemble learning algorithm is often used to solve the unbalanced classification problem.In order to verify the effectiveness of the resampling method and the ensemble learning algorithm in dealing with the unbalanced classification problem,four sets of data sets from UCI and KEEL with different imbalances are used to simulate the experiment,the results show that the resampling method and the ensemble learning algorithm can effectively improve small class sample recognition rate of the classification model.The ROSE method is a method of synthesizing data,improved its weight coefficient and combined with the random under-sampling method,then the RHS(Random Hybrid Sampling)method is obtained,and the classical AdaBoost algorithm is used as the ensemble learning algorithm,we get the RHSBoost(Random Hybrid Sampling Boosting)algorithm.The basic idea of this algorithm is to first obtain a balanced data set by random under-sampling method,and then use the improved ROSE method to synthesize more artificial data,the AdaBoost algorithm can change the error classification weight of small class sample,so that we can achieve to enhance the classifier.In this paper,under the premise of adopting decision tree as a base classification algorithm,we use the bank credit data set to experiment with the RHSBoost algorithm,the SMOTEBoost algorithm,the resampling method and the ensemble learning algorithm is compared to prove the feasibility and advantages of RHSBoost algorithm.
Keywords/Search Tags:Credit Scoring, Unbalanced Classification, Resampling, Boosting, RHSBoost
PDF Full Text Request
Related items