Font Size: a A A

An Empirical Study On Risk Identification Of P2P Lending Platform Based On R

Posted on:2019-03-11Degree:MasterType:Thesis
Country:ChinaCandidate:L LiFull Text:PDF
GTID:2429330548477668Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
In recent years,the rise of Internet finance has accelerated the flow of funds in the whole financial market.All kinds of new economic formats are rapidly derived,and the development of domestic P2 P industry is particularly fast.The number of P2 P platforms is increasing every day.But the accompanying problems and risks are also gradually improving.Various problems such as carrying money and running away,raising cash difficulties and closing businesses are emerging.Investors also face great risks when they invest.How to identify a high risk platform from various P2 P platforms is a big problem for both the government and the investors.This paper aims to study the risk identification problem of P2 P platform,so that investors can be more cautious in the process of investment decisions.In this paper,we find a best algorithm of classification performance through the comparison of models,and select the best index combination that has great influence on the risk of P2 P platform.Then factor analysis is used to calculate the score of factor synthesis and the 627 P2 P platforms were evaluated and sorted by comprehensive score.Finally,we use the top 50 and the last 50 P2 P platforms to verify the risk prediction ability of the P2 P platform risk assessment system.In the analysis process,this paper mainly uses R software for data analysis.First,SVM algorithm,Boosting algorithm,random forest algorithm,Bagging algorithm,classification and regression tree algorithm,K nearest neighbor algorithm and quadratic discriminant analysis model are used to compare the classification performance of seven models.The training set and test set are divided by ten fold cross validation and the classification performance of the seven models is compared.Finally,the random forest algorithm with the best classification performance is selected for feature selection to determine the importance of variables.The optimal index system for risk identification of P2 P platform is selected from 32 variables,and the characteristic results show that there are 15 indicators that have great impact on the risk of P2 P platform,such as the review comprehensive score,average interest rate mean,the mean of the number of borrowers,the automatic bid,the way of guarantee,the net flow in the last 30 days,the number of comments,and so on.Further use factor analysis to analyze the P2 P platform risk evaluation system constructed by the selected 13 indicators,extract four public factors,and then calculate the comprehensive score of the P2 P platform risk.Finally,the 627 P2 P platform is sorted by comprehensive score,which verifies the rationality of the random forest algorithm for feature selection and the better risk prediction ability of P2 P platform.The preliminary selected 32 variables for the study are from three aspects of the sample platform,including basic information,transaction information,and netizens' comment information of Internet crawler technology crawled out(such as Python,octopus,etc.).Before doing data analysis,we first do data pre-processing,such as missing values,outliers and standardized processing.The main method of missing value processing is multiple interpolation.The outliers processing method mainly uses the outliers of random forest models to identify outliers,then delete and reuse multiple interpolation methods.Standardized treatment is applied to deal with positive and reverse indicators respectively.In the process of empirical analysis,the SMOTE method is used to balance the sample data.The following conclusions are drawn from the empirical analysis.Conclusion 1: Random Forest algorithm has the best classification performance in the P2 P platform risk identification process from the total classification accuracy rate,the first type of error rate(high risk miscarriage for low risk),and the second type of error rate(low risk miscarriage for high risk).Conclusion 2: The results of feature selection by Random Forest algorithm show that the 15 indexes,such as review comprehensive score,average mean interest rate,mean value of the borrower,automatic tender,guarantee method,net inflow fluctuation of 30 days,and review number,have the greatest impact on the risk of P2 P platform.Finally,through the comprehensive score calculated by factor analysis,it is further verified that the Random Forest algorithm has better classification performance.Conclusion 3: In the process of variable importance of random forest algorithm and the factor analysis process it is found that the public opinion information in all variables is the first principal factor,which has the greatest impact on the risk of P2 P platform,and further illustrates that the network public opinion is critical to the risk identification of P2 P platform.
Keywords/Search Tags:P2P platform, data mining, risk identification, public opinion analysis, random forest algorithm, factor analysis
PDF Full Text Request
Related items