Font Size: a A A

A Research On The Effectiveness Of Personal Credit Assessment Models Based On P2P Platforms

Posted on:2021-04-20Degree:MasterType:Thesis
Country:ChinaCandidate:J M TanFull Text:PDF
GTID:2428330620464359Subject:Finance
Abstract/Summary:PDF Full Text Request
P2P online lending is an important form of Internet financial innovation.In the con-text of the current increase in China's supervision and the progress and completion of the record work,China's P2P personal lending industry does not have a comprehensive system for borrower credit assessment or risk identification,so this thesis focuses on the borrower's credit assessment in the P2P platforms,and in particular,the impact of personal information on credit assessment.This thesis is based on the data of the American giant platform Lending Club.Through the modeling and analysis of the borrower information on the Lending Club platform with multiple algorithms,we evaluated the effectiveness of different algorithms for credit assessment and the effectiveness of different personal information.For credit assessment,the current mainstream methods are algorithms such as Logistic regression,random forest,and support vector machines.However,gradient boosting methods have not attracted enough attention,especially the LightGBM algorithm which is widely used in data mining and other related fields in recent years,as an efficient and fast improvement of the gradient boosting tree algorithm,is rarely studied in the field of loan evaluation and its effect has not been compared with other commonly used data mining algorithms.This thesis focuses on the LightGBM model,discusses the effects of various algorithms,and concludes that LightGBM has advantages in both effectiveness and speed.I hope it can provide a constructive suggestion of model adoption for the future online lending industry of China,and thus promote the healthy and orderly development of the P2P lending industry.This thesis takes the large amount of public data of the Lending Club platform from 2007 to the fourth quarter of 2018 as an example.Through comprehensive exploratory data analysis and feature selection,we conducted a preliminary analysis and judgment on the effectiveness of different personal information.In addition,we use SMOTE algorithm to deal with the data imbalance problem to make the classification results have reference value.We constructed four models including Logistic Regression,Random Forest,BP Neural Network and LightGBM to fit on the same data.The test results show that the LightGBM algorithm has the highest model effect,and it has exceeded 90%in various metrics including accuracy,precision,recall and F1.Although the effect of random for-est,especially the AUC score is only slightly lower than that of LightGBM,it consumes much more time than LightGBM.In the context of credit assessment,where timeliness is significant,LightGBM obviously shows higher practical value.Although logistic regres-sion and BP neural network are faster than random forest,their effect is lower.Despite this,all these four models have achieved an AUC score of more than 90%,greatly exceed-ing the previous works related to Lending Club,which also reflects the effectiveness of our data processing and feature selection methods.In addition,we conducted an experimental analysis of the effect of the amount of data on the model's performance.We use the down-sampling method to extract a data set with a small amount of data,and analyze the performance changes of the four algorithms under different data amounts.Experiments show that the effect of Logistic regression will peak at a certain amount of data and cannot be improved.The effect of BP neural network shows an overall upward trend as the amount of data increases,but the effect of the model is unstable and fluctuates greatly,while Random Forest and LightGBM shows a clear upward trend with the increase of the amount of data,and the effect of LightGBM has always been better than that of random forest.This further shows the practicability of LightGBM in the case of big data.Finally,we conducted feature importance analysis on three algorithms,including Logistic Regression,Random Forest,and LightGBM,and enumerated the features that contributed the most to their model prediction through different methods,and reached some common conclusions by comparison,i.e.what kind of individual Information has a significant effect on loan credit assessment.Our analysis concludes that in all personal information,the last payment amount,loan amount,and interest rate all show the most prominent importance in the above algorithms,especially the last payment amount which has not been mentioned in previous work.According to this conclusion,we hope to pro-vide a certain reference significance for the credit assessment of the online loan industry and contribute to the construction of an efficient credit assessment system in China.
Keywords/Search Tags:Loan credit assessment, P2P online lending, LightGBM, Personal information
PDF Full Text Request
Related items