Font Size: a A A

P2P Online Loan Default Risk Early Warning Research

Posted on:2021-01-06Degree:MasterType:Thesis
Country:ChinaCandidate:D ZhouFull Text:PDF
GTID:2438330626954329Subject:Financial statistics and modeling
Abstract/Summary:PDF Full Text Request
Over the past few years,P2 P online lending platforms have shown a brutal expansion,but the recent "running" tide and frequent closures have caused frequent concerns about the security and stability of China's P2 P online lending market.P2 P online lending faces many risks,among which credit risk is more prominent than traditional financial industry.The borrower's credit risk(ie,default risk)means that the borrower cannot perform the contract and cannot repay the principal and interest on time.Once the borrower defaults,it will give direct financial losses to the borrower and the platform's economic interests,and large-scale defaults will seriously affect the operation of the platform,shake investors' confidence,and restrict the development of the industry.At the same time,P2 P online loans have a short history of development,lack of rich customer information and operational management experience that commercial banks have,and there are many and scattered borrowers.It is difficult to rely on the experience and judgment of internal experts for credit risk assessment.The platform's credit risk assessment And early warning are facing great challenges.Therefore,the establishment of a more accurate model for the assessment and early warning of the borrower's default risk has important practical significance for protecting the interests of investors,the safe operation of the platform and the healthy development of the industry.At present,domestic and foreign researches on P2 P platform borrower default prediction have combined many machine learning algorithms,but the current research is relatively rough in the feature selection and missing data processing in the early stage of prediction,and most of them are used in the final model selection.A single model is used to predict the default,which makes the prediction results may not be accurate and complete,so this article will optimize the three aspects of feature selection,missing value processing and prediction model.This paper first uses random forest importance ranking for feature selection,and because there are many variables in this paper and subsequent linear models will be involved,therefore,subsequent correlation analysis and Lasso regression are performed to avoid overfitting problems.In the processing of missing values,it is different from the traditional filling method in the past.Here,the machine learning algorithm is mainly used for filling,including KNN filling,multiple interpolation,missing forest filling and other methods.Fill in the missing value method.Before the final default prediction,we found that there is a problem of unbalanced data.Therefore,before the prediction,we first processed the unbalanced data.The processing methods include undersampling,oversampling,and manual data synthesis.A good Smote algorithm processed the data.In the final default algorithm,this paper first compares the prediction accuracy of the classic single algorithm from multiple angles,and then uses the Stacking fusion algorithm based on the single algorithm.It is found that the fusion model effectively improves the prediction accuracy.In addition,this article also compares the prediction effect of traditional methods to fill data and machine learning algorithms to fill data in the final prediction model.The results show that the data filled with machine learning algorithms is more “fidel”,and the final prediction effect Also better.
Keywords/Search Tags:Feature selection, Missing value processing, Unbalanced data, Stacking model
PDF Full Text Request
Related items