Font Size: a A A

Research On The Application Of Ensemble Learning In Online Loan Risk Control

Posted on:2022-12-24Degree:MasterType:Thesis
Country:ChinaCandidate:X Y ZhanFull Text:PDF
GTID:2480306779969509Subject:Investment
Abstract/Summary:PDF Full Text Request
With the rapid development of China's Internet credit market,the scale and amount of online credit transactions have risen significantly,and credit methods and the groups for which credit business is offered have become more complex and diversified.The core of a financial enterprise is risk control,and effective risk prevention is a key factor in maintaining stable returns,identifying potential defaulters in advance is therefore important to reduce credit risk.From the perspective of the penalty algorithm,the model's misclassification cost for defaulting customers is higher than that for performing customers,and risk prevention should focus on the classification accuracy of defaulting customers in the sample.In normal credit business,the occurrence of customer default events is often rare,and the credit data has the characteristics of an unbalanced distribution of positive and negative samples.Using traditional classification models for classification may result in inaccurate classification of minority samples.Therefore,how to improve the impact of unbalanced data on the model and establish a more accurate and efficient pre loan default prediction model has important reference value for reducing the credit risk of Internet financial companies and traditional credit institutions such as banks.With the above background and purpose in mind,this paper takes the pre-loan credit assessment in the field of financial risk control as the research object.Most of the classification models established for the Lending Club public data set in the past belong to the prediction of loan behavior,and there are problems such as the model is relatively single,and the imbalance of the data set samples is not considered.This paper explores the pre-loan default prediction model on the basis of previous research,and evaluates the borrower's credit status through a variety of models,in order to predict the probability of default in the future,shut out potential defaulting customers,and try to avoid risks in time at the source of borrowing.In addition,this paper also focuses on the problem of sample imbalance in the data set,hoping to improve the classification accuracy of the model for minority samples through the data and algorithm levels,and try to identify the "bad customer" samples in the customer group.For the problems of high data dimension,large noise,and unbalanced sample distribution in the credit data set,this paper uses a combination of oversampling technology and integration technology to build a pre-loan default prediction model.Firstly,the processing of dimensionality reduction and noise reduction of credit data is completed through feature engineering.For the imbalance of credit data,Borderline SMOTE is used for oversampling to achieve the purpose of balancing positive and negative samples.The traditional risk assessment index is relatively single,ignoring the classification accuracy of the model for minority samples,and does not consider the model's ability to distinguish risks.Based on the original evaluation indicators,this paper selects the precision,recall rate,accuracy rate,F-score,AUC value and KS value to comprehensively evaluate the performance of the model.Through the comparison of the output performance of each model,it is concluded that the model based on ensemble learning(XGBoost,Light GBM,Cat Boost)has better classification performance than the traditional Logistic regression model.In addition,it is also found that the data processed by the Borderline SMOTE oversampling technology can improve the classification accuracy of the minority class samples with a small loss of overall accuracy when building a classification model.Through the feature importance output of each integrated model,it is found that credit information features have a strong effect on model classification,which can provide reference for banks or Internet finance companies and China's credit reporting system to improve customer information more effectively.Finally,this paper selects the XGBoost model to build a credit customer application scorecard,converts the default probabilities output from the model into individual credit scores,and displays the bad sample ratios in each scoring box,which can be used as a reference for Internet finance companies as well as traditional credit institutions in their credit strategy formulation.
Keywords/Search Tags:pre-loan credit assessment, imbalanced data, Logistic regression, ensemble learning
PDF Full Text Request
Related items