Font Size: a A A

Credit Scoring Algorithm Based On Imbalanced Data Processing And Feature Selection

Posted on:2020-04-27Degree:MasterType:Thesis
Country:ChinaCandidate:S LiFull Text:PDF
GTID:2428330590481888Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet finance,many banking institutions and online lending platforms have exposed more and more credit risks.The credit scoring,as an effective tool,can make use of customers' information and the data of customers' activities to identify potential risks,which plays an essential role in financial institutions.In this thesis,we take into account of the characteristics,massive,high-dimensional and imbalanced,of the credit customer data.After data-balancing and feature selection,we establish a credit risk assessment model based on ensemble learning to achieve the risk-assessment of credit customers.The contributions of this thesis is described as follows:(1)We propose a method based on selective mixed sampling for imbalance credit data,short for Se_MS.By analyzing the distribution of the credit customer samples belong to different classes,we selectively choose both the minority samples and majority samples to solve the problem of unreasonable risk-assessment caused by the imbalance distribution of credit customer samples.The experimental results demonstrate that the F-measure and G-mean of C4.5 risk-assessment model increase by 6% and 7% respectively comparing with the SD_ISMOTE method,while applying for the credit customer data processed by the Se_MS method.(2)We propose a credit feature selection method based on multiple filters combined with the NSD(new separable degree)index,short for MFN.By measuring and evaluating the importance of credit customer's features from multiple perspectives,the MFN method can avoid the problem that multi-aspect information of credit customer's features can be easily overlooked while using a single filter for feature selection.The experimental results demonstrate that because of the optimal feature subset selected by the MFN method,the accuracy rate increase by 11.8% comparing with the SFS-LW method(a single filter).Comparing with the method of combining multiple filters with the wrapper,the time efficiency of feature selection of the MFN method proposed by us improves 30%-80%.(3)Combining with the static integration and the dynamic-selection integration,we propose two risk-assessment models,the FS-Bagging model based on static integration andthe FBK model based on dynamic-selection integration,based on ensemble learning to achieve risk-assessment of credit customers.The experimental results demonstrate that the FBK model based on dynamic-selection integration performs best in the risk-assessment of credit customers.The AUC and ACC+ of the FBK model increase by 2% and 2.5%respectively,comparing with the Un-Ext-GDBT model.In summary,the model proposed in this thesis that integrates imbalance-data processing and feature selection can make risk-assessment of credit customers more accurate.Not only can the model help financial institutions to avoid risks reasonably and reduce losses,but provide valuable decision-making guidance for the actual credit risk management.
Keywords/Search Tags:Credit scoring, Imbalanced data processing, Feature selection, Ensemble learning
PDF Full Text Request
Related items