Font Size: a A A

Default Prediction Of P2P Online Loan Users Based On Unbalanced Data

Posted on:2021-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:M C YuanFull Text:PDF
GTID:2518306455481914Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
Peer-to-peer(P2P)Internet lending is a new lending mode in the Internet age and it develops rapidly with the advantages of simple and convenient operation,fast lending and low borrowing threshold.P2 P Internet lending has played an active role in making people borrow or lend money conveniently and making up for the shortage of traditional financial institutions,but it also leads to high bad debt rate and even platform running.The main reason for these problems is that the P2 P platforms can not effectively evaluate the default risk of borrowers.Therefore,it is necessary to build an effective P2 P Internet lending user default prediction model to help the platform identify the possible default users and effectively reduce the default risk of P2 P Internet lending industry.Based on the real loan data of Lending Club platform,this paper establishes a user default prediction model and focuses on the identification rate of default users.Firstly,we clean the data and carry out feature engineering.On this basis,we use SMOTE oversampling,random under-sampling,SMOTE-Tomek Links mixed sampling and cost-sensitive to solve the class imbalance problem in Logistic regression model,Light GBM model and Cat Boost model respectively.By comparing the classification results before and after treatment,we find that the cost-sensitive Logistic regression model,the under-sampling Light GBM model and the cost-sensitive Cat Boost model can significantly improve the identification rate of default users and the recall rate on the test set are 75.44%,82.98% and 86.66% respectively.At the same time,we establish a model for class imbalance,which is Self-paced Ensemble model and its recall rate on the test set is 81.79%,which means it can handle class imbalance problem.On the whole,the cost-sensitive Cat Boost model is more suitable for our data.Therefore,our research results can provide some reference value for P2 P Internet lending industry,help P2 P platforms reduce the default rate of users and promote the sustainable and healthy development of P2 P Internet lending industry.
Keywords/Search Tags:imbalanced data, P2P Internet lending, default prediction, Logistic regression, ensemble learning
PDF Full Text Request
Related items