Font Size: a A A

Research On Feature Engineering And Model Generalization Of Credit Default Prediction

Posted on:2021-10-07Degree:MasterType:Thesis
Country:ChinaCandidate:P QuFull Text:PDF
GTID:2518306113467044Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
P2P online lending is an innovative form of finance that can meet the borrowing needs of SMEs and individuals.However,in actual operation,P2 P lending has greater risks.P2 P lending platforms face problems such as credit sub-insurance,technical risks,and compliance risks.Credit risk is one of the most prominent risks.For the problem of default risk in P2 P online lending,this paper studies based on the 2016-2019 data set of LendingClub,a US P2 P lending platform.For a long time,statisticians have a general consensus that data and features determine the upper limit of machine learning,and models and algorithms can only approach this upper limit.Therefore,this article will also focus on feature engineering processing before model building..First of all,because there are a lot of missing features in the original data,and most of the variables are not related to the target variable,this paper uses manual analysis and the size of the IV value to screen the predictor variables,and maximizes the decision tree information to process the IV value.Binning of numerical variables during calculation.Secondly,in order to explore the prediction effects of different models,and considering that there are some character variables(discrete features)in the data set used,this article chooses the model Catboost and LightGBM algorithms that can handle discrete features in the model.Stacking fusion of Catboost model and LightGBM model for comparison experiments.Finally,because in the experiment,considering that the model may show a certain near-term effect,that is,the fitted model’s prediction effect on the recent test set is significantly better than the long-term prediction effect,this article attempts to establish a rolling prediction to this hypothesis.While verifying,try to find the months needed for the best training set.In the empirical process,this paper compares the effects of the Catboost,LightGBM algorithm,and Stacking fusion models on the same training set and test set.It is found that the LightGBM algorithm not only performs well in prediction accuracy,but also has a good generalization effect of the model.The running speed is also fast,so the overall performance is relatively good.Based on the above research,P2 P online lending can correctly evaluate the borrower’s credit by establishing relevant machine learning models such as LightGBM to achieve effective early warning of the borrower’s default probability,thereby effectively preventing or reducing the risk of default in P2 P lending transactions and avoiding reverse The issue of choice and moral hazard;at the same time,the research in this paper also provides a reference for the development of China’s P2 P industry,providing material for the country’s macro-control and grasp of policy orientation.
Keywords/Search Tags:P2P lending, logistic regression, LightGBM algorithm, LendingClub dataset
PDF Full Text Request
Related items