| With the development of social economy,personal credit plays an increasingly important role in economic society.China’s credit ratings started late,and the scope of use is narrow,and there is a big gap compared with the development of foreign credit ratings.Compared with traditional credit loans,China’s P2P online credit lending has developed rapidly in recent years,meeting the loan needs of more people.P2P online credit loan platform shows the development direction of China’s future financial institutions,but there are major challenges in the credit loan risk assessment system and supervision.This article mainly studies the current representative P2P online credit lending platform,Lending Club.As the world’s first listed P2P online credit platform,Lending Club operates efficiently and has a perfect credit scoring system.After more than ten years of development,it has become the world’s largest P2P online credit platform.This article collects the credit loan data sets of Lending Club users from 2007 to 2015.First,the individual loans provided by the user when analyzing the loan are analyzed through the two typical credit loan states that the user fully repays the loan within the agreed time and cannot be repaid and becomes bad debts.The relationship between attributes,personal attributes and loan status,and the correlation between personal attributes.Then the user’s data set is cleaned through the lack of data set attributes,the qualitative and quantitative conversion of user attributes,and the selection of related attributes,and the data set is divided and standardized.Finally,use Naive Bayes classifier,logistic regression,linear support vector machine,fisher linear classifier,BP neural network,CART classification tree and XGBoost algorithm to train and predict the divided data set,and analyze the user loan status is good Whether the account is a bad debt or not,provide an evaluation result as a reference for whether to issue a loan.This article compares the performance of several machine learning models on the data set of Lending Club users through multiple angles such as accuracy,precision,recall,and AUC.Finally,based on the user rating identified by Lending Club,a user data set that is easier to classify is divided,and then the machine learning model is used for evaluation to improve the prediction effect of the model,and the results are compared and discussed with the previous data set.It provides a reference method for the construction and improvement of personal credit loan risk assessment in China,and provides assistance for China’s economic development.This study also has shortcomings.Lending Club user data analysis of other loan status is not used,as well as the processing of missing values and attributes correlation processing,etc.will have an impact on the prediction of the user’s loan status.In addition,Lending Club also has similar characteristics of "good account"and "bad account" users,which makes it difficult for the classification model to correctly distinguish some data and affect the accuracy of prediction.I hope we can overcome these difficulties in future research. |