Font Size: a A A

Research On Default Prediction Model Of P2P Platform Based On Data Mining Methods

Posted on:2021-01-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y ChenFull Text:PDF
GTID:2507306107479944Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
P2P online lending is a product of the development of Internet finance,and it is an person-to-person lending method.Compared with traditional bank lending,the threshold for P2 P lending is lower,and it plays an important role in solving the problem of small and medium-sized enterprises’ loan difficulties.But because of its low loan threshold and the use of Internet technology,it is more difficult to control user identification,and the problem of information asymmetry is more serious.How to effectively solve the problem of online loan default is a huge challenge in the development of China’s online loan platform.Based on the data of the American Lending Club platform(P2P platform)from2007 to 2015,this data collected 252970 samples,78 variables which contain the borrower’s basic information,financial status,transaction information and transaction status,etc.Exploring and analyzing the data features,selecting effective features,establishing logistic regression model,random forest model,GBDT(Gradient Boosting Decision Tree)and logistic regression fusion model to predict the loan default rate,and comparing the above three algorithms.Logistic regression model is a very effective method to solve the binary classification problem.Using the regression to solve the classification problem,make the model more interpretable.The random forest model establishes multiple classification trees and selects the final category of the sample through voting,and can give the importance of features.Compared with traditional Boosting,GBDT is calculated every time to reduce the last variance,and can flexibly process various types of data.In the data processing and feature engineering section,we performed missing value processing,one-hot coding,feature selection and principal component analysis for dimensionality reduction.The empirical results show that the logistic regression algorithm model and GBDT fusion learning can effectively deal with uneven P2 P loan data,and random forest does not perform well on such data.
Keywords/Search Tags:Online lending, Default prediction, Data mining technique
PDF Full Text Request
Related items