| Since the emergence of P2 P online lending,after several ups and downs,frequent credit risks have become the main problem limiting the healthy development of the industry,and the information asymmetry among platforms,borrowers and investors is the main cause of credit risks.Some studies have proved that alleviating information asymmetry in online lending transactions can significantly reduce credit risk.While the borrower is in an absolutely dominant position in information,so it is particularly important to conduct research on the borrower’s disclosure information.Based on the real transaction data of Renren loan platform,this paper explores the disclosure information of borrowers,analyzes the influencing factors affecting the success of lending,and carries out predictive research on lending results by using a variety of classification models:1.The selection and sorting of variables are performed on the original transaction data set,and missing values and noise data are removed.On the basis of the analysis of existing research,the variables of hard information are selected from the four aspects of order information,basic physiological information of borrowers,economic ability and credit information.In addition,the variables about language features and content topic features of loan description were used as soft information supplement.At the same time,a variety of visualization methods are applied to descriptively analyze and visually reflect the data distribution of important variables.2.Taking the order status,namely the success or failure of borrowing,as the dependent variable.Firstly,using variables of hard information as the independent variable to conduct logistic regression analysis to explore the influence of hard information on the lending result.On the basis of the results,control variables of hard information were established,and then the variables related to the loan description were incorporated into the regression model to explore its influence.In addition,the importance of variables is analyzed by using multiple classification models.It is found that the credit information of the borrower has a significant impact on the success of the loan.The variable of the credit limit shows high importance in each model.Both the linguistic and the content features of the loan description can influence the loan results,and the importance of related variables is obvious.3.The data set was divided into training set and test set at a ratio of 7:3.Some hard information variables were selected as the original variables,Logistic,neural network,random forest and support vector machine(SVM)models are used to predict the success of borrowing.At the same time,it also analyzes the changes of the prediction results before and after the loan description related variables are added.It is found that among the four models,the overall prediction accuracy of random forest is higher,and the performance of Logistic and SVM is relatively close.After the borrowing description information is added,the prediction results of each model are improved.In particular,the prediction accuracy of positive samples(TPR)was significantly improved,and the ROC curve fully reflected the above conclusions.4.The performance and applicability of each model in the prediction process were robustness verified in the case of 80% and 60% training set.The results show that the size of the training set does not affect the performance of the model under the corresponding indicators of the confusion matrix,and further validates the optimization effect of the loan description on the prediction results. |