At the present stage of Chinese economic prosperity and development,in order to meet the diversified capital lending needs of the society,the online loan business under the Internet financial industry can rise rapidly,and the number of online loan institutions will expand rapidly.But at the same time,the development of the online loan industry has also encountered a series of problems,among which the most restrictive problem is the high loan default rate.Therefore,the top priority is to use effective methods to reduce the loan default rate.This paper establishes an effective"online loan default risk identification" model in the pre loan audit link with the help of relevant machine learning algorithms,in order to realize the accurate identification and timely early warning of potential default risk,and keep the default risk in the pre loan audit stage as far as possible,The pressure of in loan review and post loan management will be greatly relieved.So as to effectively reduce the operation cost of the platform and the economic loss caused by the high default rate,and make great progress in the risk control quality and viability of the platform.Firstly,this paper briefly describes the development of online loan industry in recent years and the inducement of default risk,and makes a comparative analysis of default risk identification methods from the perspective of traditional finance and Internet finance;Then,taking the desensitized user loan data set of an online loan platform as an example,data cleaning and feature engineering are carried out on the pre loan information in the data set,the number of negative information in user public opinion is determined with the help of Roberta model,and the determination results are added to the subsequent model training as features.Then,using the processed data set,the "online loan default risk identification" models of logistic regression,support vector machine,random forest,xgboost and MLP are established respectively,the parameters of the model are optimized by grid search method,and the prediction effect of each model is evaluated by the combination of AUC value,KS value,F1 socre and accuracy;Finally,xgboost and logistic regression are used as model fusion to try to further improve the prediction effect of the model.The final research results of this paper show that the model established by using machine learning algorithm in pre loan audit can play the role of default risk identification,and the efficiency and accuracy of model identification are higher than manual identification.Among many models in this paper,xgboost model has the best prediction effect,and the prediction effect of model fusion of xgboost and logistic regression is slightly worse than that of single xgboost.From the later trial performance of the model,the model can efficiently and accurately identify the default risk of users,and can make timely risk early warning in combination with the OA system of the platform,reducing the complicated human work and the cost of platform operation.To sum up,the research results of this paper can effectively assist the online loan platform to identify the default risk of users in the pre loan audit link and reduce the economic losses caused by high default rate. |