Font Size: a A A

Research On Credit Risk Evaluation Under Unbalanced Data Set Based On Integrated Learning

Posted on:2022-12-31Degree:MasterType:Thesis
Country:ChinaCandidate:C YangFull Text:PDF
GTID:2518306608468944Subject:FINANCE
Abstract/Summary:PDF Full Text Request
With the continuous development of the Internet finance industry,online lending as a new financing method has gradually entered people's lives.However,with the increase in the number of online lending platforms in recent years,the quality of the platforms has been uneven,and loan defaults have occurred frequently,which has caused a great negative impact on the online lending industry.Therefore,how to reduce the risk of credit default is an urgent problem to be solved.Nowadays,most of the data of online lending platforms have problems such as high feature dimensions and unbalanced proportion of positive and negative samples,which brings great challenges to credit risk modeling.In order to solve the above problems,this paper studies from three aspects: feature selection,data imbalance processing,and classification algorithm,so as to establish a credit risk prediction model to evaluate the credit default risk of borrowers.Firstly,in the feature selection stage,a feature selection method combining Filter and Random Forest is proposed.This method first uses the improved Relief algorithm to perform preliminary screening of features,and reduces the impact of data imbalance by increasing attention to minority samples and boundary samples.Then combined with the maximum information coefficient method to eliminate redundant features.In order to obtain the best feature subset,the Random Forest algorithm is further used to filter the features to obtain the final result.Secondly,an improved oversampling method is proposed for the problem that the imbalance of data proportion affects the classification results of the model.Based on the Borderline-SMOTE method,this method introduces the idea of adaptive density to synthesize a reasonable number of new samples for each boundary minority class sample,and uses the improved interpolatio n method to make the interpolation area of the new sample closer to the original minority class sample,so as to avoid the phenomenon of fuzzy sample boundaries.Finally,in order to further improve the classification effect of the model,a credit risk assessment model based on ensemble learning is proposed.The model first uses the Focal loss function to improve the loss function of the Light GBM algorithm,and uses the new algorithm as the base classifier,and then combines the random subspace method and the Ada Boost algorithm to integrate the base classifier to establish a credit risk assessment model.Empirical research based on the lender data of the Lending Club online loan platform,and a comparative analysis with other integrated classification models,confirms the effectiveness of the oversampling method and integrated classification model proposed in this article,and is more suitable for credit risk assessment problems under imbalanced data sets.
Keywords/Search Tags:credit risk, unbalanced data, integrated learning, Borderline-SMOTE, AdaBoost algorithm
PDF Full Text Request
Related items