Font Size: a A A

Construction And Application Of P2P Loan Default Prediction Model Based On Stacking

Posted on:2020-05-03Degree:MasterType:Thesis
Country:ChinaCandidate:J Y WangFull Text:PDF
GTID:2370330578965063Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
In recent years,with the development of Internet finance and computer technology development,the intermediary role of traditional financial institutions has declined.The concept of internet financial management has become more and more popular among people.The public has gradually adopted the P2P network loan platform as an important way for financial consumption management.The domestic P2P online lending industry has developed rapidly,but the problems and risks accompanying the rapid development are gradually improving,in 2018,China’s P2P online lending platform has experienced a concentrated thunderstorm,and a large-scale customer default phenomenon has occurred,resulting in high bad debt rate,a large number of platforms have difficulties in cash withdrawal and bankruptcy.Therefore,how to accurately identify potential default customers and reduce the risk of credit default becomes an urgent problem,only by handling the credit default of borrowers can we better promote P2P network loans in China.This paper aims to accurately identify potential default customers of P2P platform by establishing a personal loan default prediction model,in order to reduce platform operating risks,optimize China’s Internet financial environment,and reduce Internet financial risks.In view of the fact that there are few quantitative research on risk of P2P platform in China,the use of machine learning algorithm is simple and lacks the actual situation of multi-model fusion strategy,this article uses Python to crawl everyone’s loan and loan data,and uses Python,R and other analysis software to firstly carry out Exploratory statistical analysis such as data preprocessing and Cox survival analysis,after using the Border-line Smot algorithm for unbalanced data,the Logistic,support vector machine,Adaboost,Xgboost,random forest,naive Bayes 6 classical classification models were constructed by the combination of IV information value and Gini index,after parameter tuning of each model by grid search method,using F2 value as the model performance evaluation index to select Logistic,support vector machine,Adaboost,Xgboost model Finally,the final model of loan default prediction is built by stacking four groups of models by Stacking algorithm,and the following conclusions are drawn:1)Through loan life analysis,it is found that small loans are more likely to default than large loans,P2P network lending platform should strengthen the supervision and review of microfinance applications;secondly,borrowers are more likely to default when the loan period is near,before the repayment date approaches,the platform needs to pay special attention to the borrower’s recent repayment performance and strengthen the collection and supervision of borrowing.2)Data training based on different balance ratios will affect the performance of the model,and the closer the training data is to the 1:1 equilibrium state,the worse the model performance,and the training data balance ratio of 1:3 is more conducive to the processing of the model performance.3)When not based on the Stacking fusion algorithm,the Xgboost model performs best in each single-group model,and which is superior to the other classical classification models in establishing a personal loan default prediction model.4)The Stacking fusion model based on Logistic,Support Vector Machine,A daboost and Xgboost 4 models performs best in all models.It proves that the Sta cking fusion model established in this paper is a better performing personal loan default prediction model and Stacking model,the superiority of the fusion algorit hm in the field of personal loan default prediction has certain reference value for applying the model fusion algorithm to the field of personal loan default predictio n in China.5)From the perspective of model application,the control variable method is used to study the change of the model’s predicted default probability at different interest rates.By adjusting the borrowing rate to change its corresponding default probability,it will fall to the acceptable probability of default of the platform.It has certain positive significance for assisting the platform to achieve bad customer conversion.
Keywords/Search Tags:P2P online loan, Machine learning, Survival analysis, Default prediction, Stacking fusion
PDF Full Text Request
Related items