| In recent years,the P2 P network lending industry has developed rapidly,and has become one of the important channels for small and medium enterprises and individual to finance.It has brought many opportunities and challenges for the rapid development of our country's economy.In the process of loan,the traditional commercial banks conduct the judgment of the borrower's ability and willingness to repay by means of asset mortgage and investigation in order to reduce the risk and loss that the lender may cause.The lending behavior in the P2 P network takes place in the environment of the Internet,and there is no direct contact between the lenders and the borrowers,so there is a large information asymmetry.Investors often fail to accurately judge the borrower's credit level,resulting in greater investment risk loss,which seriously affects the interests of investors.To improve the risk identification ability of the Internet financial platform and enhance the level of risk control is the requirement of the P2 P lending platform,and is also a core work of Internet Finance in China.Through the data of historical borrower,this paper uses machine learning method to effectively discriminate users with high probability of default risk.The soft and hard information of the borrower is converted into a credit score that can measure the risk of default of the user in a reasonable way and therefore the investors can discriminate the credit risk of the borrower effectively,which is helpful to the borrower to control risks and reduce investment risks.Based on the data of Lending Club about the borrowers,this paper analyzes the portrait of fraudulent users by the data of borrowers' borrowing purposes,income level,residence and work years,and qualitatively analyzes the possibility of default risk of the borrowers.In the process of empirical analysis,this paper uses the machine learning method as the technical means,constructs the data set through the method of feature engineering,uses the logistic regression model to predict the default possibility of the P2 P network borrower,and calculates the user's credit score according to the default probability of the user,so as to expect to reflect the P2 P intuitively.The credit of users reflects the potential default risk of P2 P network borrower.After the construction of the model,the training subset and the testing set are split on the basis of the loan time.The KS index and the PSI index are used to test the model distinction effect and the stability of the data subset of the training set and the test set in different months.Finally,the logical regression model is established for the training subsets to determine the key features,so as to study the dynamic changes of the features of the model through time. |