Font Size: a A A

Analysis Of Loan Default Prediction Based On Ensemble Learning Algorithm

Posted on:2022-02-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y F WangFull Text:PDF
GTID:2518306527952329Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
With the rapid development of global economy,loan has been an important way to promote consumption,promote currency circulation and help individuals deal with economic problems.However,the formation of people's consumption concept and the high-income business of commercial banks are accompanied by the increase of financial risk events.Loan service is a major category of financial risk events,and it is one of the highest risk businesses in each major bank.It is very important for financial institutions to detect the potential factors that affect the loan default through the massive data of the borrower or the loan enterprise,obtain the potential information,reduce or eliminate various possible risk events in the process of financial loan.Therefore,in terms of loan business,this paper expects to build a more comprehensive index system and predict the default problem through the current mainstream ensemble learning algorithm.This paper mainly discusses the following four aspects.Firstly,the background and significance of the research are introduced.According to the relevant business knowledge in finance,the existing traditional methods of loan default are reviewed,and various integrated learning classification methods used in this paper are described.Secondly,descriptive statistical analysis and feature engineering operation are carried out for the collected loan data,so as to get simple conclusion hypothesis through visual analysis.Then,the paper uses The mainstream ensemble learning algorithm constructs loan default prediction model and uses grid parameter adjustment algorithm to get the optimal model.Finally,the performance of catboost algorithm model is better than other models in the study of this problem through comprehensive consideration of accuracy,recall rate,F1 score,ROC curve and its area below.The main innovation of this paper is to construct features from different angles,improve the index system and improve the generalization ability of the model;In the aspect of model,compared with traditional machine learning methods,ensemble learning has obvious advantages in efficiency,generalization ability and automatic processing of category features;The interpretability of the model is improved by outputting the feature importance from the SHAP value.The results of catboost algorithm model show that the loan grade,loan deadline,loan sub grade,housing ownership provided by the borrower at the time of registration,employment title and other indicators are the key indicators to predict whether the borrower has the risk of loan default in advance,which should be paid attention to when the relevant business personnel assist the customer to go through the loan procedures.
Keywords/Search Tags:default prediction, Random Forest, XGBoost, LightGBM, CatBoost
PDF Full Text Request
Related items