Font Size: a A A

Research On P2P Loan Default Risk Based On Ensemble Learning

Posted on:2021-02-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y SunFull Text:PDF
GTID:2428330614954483Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
As the most important branch of Internet finance,P2 P meets the diversified demands of investors and borrowers,and improves the efficiency of idle funds.The development of P2 P has experienced the period of germination,rapid growth and outbreak,which also exposed many problems,such as high bad debt rate,platform failure,illegal fund-raising and so on.However,with the government's intervention,the platform's strengthened management and the society's attention to the security of investment funds and the control of credit risk of lending,P2 P industry began to carry out strict control,and the industry development gradually went on the right track,becoming more and more standardized.However,the government and the platform just focus on the risk control in the loan application stage,and there is a lack of supervision on post loan risk.Therefore,the establishment of an effective P2 P loan default risk prediction model is of great significance for improving the risk control ability of the platform and promoting the healthy and stable development of the industry.In this paper,firstly,the development and current situation of P2 P network lending business are explained,the existing problems in the industry are analyzed,and the data processing methods and data mining models involved are simply described,including its principle and characteristics.Then,taking the data set of lending club as an example,data collection and sorting are carried out,and data cleaning and feature engineering are carried out for the loan records in the data set,including missing value processing,data normalization and feature extraction.After cleaning the data set,we build the logistic regression model,SVM model and extreme random tree model.Each model will select several different parameters to model,and then evaluate and compare the models with different parameters according to the accuracy,recall rate and F1 statistical value as evaluation indexes,so as to obtain the optimal parameters of this kind of model.Finally,taking the optimal parameter model of these models as the base model,GBDT as the second stage model,and using the algorithm of blending integrated learning,the final default prediction combined model is established.The research results show that the data mining model can identify and predict the default risk of P2 P,and effectively identify the credit risk of default.Ensemble learning is better than other single models,and it can provide greater help to identify and control default risk,and can provides a useful reference for the focus of attention of the future development of P2 P lending industry in China.At the same time,this paper suggests that some of the credit data of individuals and enterprises should be shared in a specific industry to provide strong support for loan risk identification,which will improve the risk control ability of the platform,reduce the cost of credit risk control of the platform,and promote the healthy development of the industry.
Keywords/Search Tags:P2P, logistic regression, SVM, extreme random tree, Blending
PDF Full Text Request
Related items