| Nowadays,many Internet financial institution credit platforms have risen with the development of network information and finance.But innovation is always accompanied by risks,and online credit is no exception,and credit defaults not only bring difficulties to the turnover of lending platforms,but also have a negative impact on the healthy and sustainable development of the national economy.Therefore,the establishment of a stable,accurate and differentiating personal credit risk assessment model is urgently needed by China’s financial platforms in order to control and prevent credit risks.This paper is based on the real credit data of the US P2 P online lending platform from the first quarter of 2018 to the third quarter of 2020,which provides a more stable and efficient personal risk assessment basis for the online lending platform.The operations when preprocessing the original dataset are as follows: unrelated feature deletion,missing values and outlier processing,data format conversion,encoding processing of character data,and evenly distributed samples.When equalizing the samples,the two resampling methods of random downsampling and SMOTE upsampling were used to deal with the unbalanced training set,and finally the results of random downsampling were better than those of SMOTE upsampling by comparing the performance of each model.Variance filtering,correlation testing,and random forest algorithms were used to gradually eliminate highly relevant and unimportant features in feature selection,respectively,and the final remaining feature dimension was 19.This paper first uses the grid search algorithm to find the optimal parameters of each model to improve model performance,and then constructs Logistic regression,stochastic forest algorithm,light GBM model to construct a single model,and measures the prediction performance of each model by comparing the accuracy rate(AUC),precision,stability and KS index values of each model,and finally uses the Voting framework to obtain a more effective fusion model based on three sets of single models.Compare the performance of the ensemble model with different metrics.The empirical results of this paper show that(1)at the data level,after data cleaning and feature engineering,there are still 19-dimensional features,and the model performance established after random downsampling of the training set is better than that of MOVE upsampling;(2)at the model level,the fusion model based on the Random Und SamplingVoting framework improves the prediction accuracy,stability and risk discrimination ability of the general model to a certain extent.And the fusion model based on the Random Under Sampling-Voting framework is better than logistic regression and random forest in AUC,but it is slightly lower than the roboting algorithm Random Under SamplingLight GBM,and the performance is optimal on the KS indicator,indicating that the Voting model has the best differentiation,and the Random Under Sampling-Light GBM model has the highest accuracy In terms of stability performance,the fusion model based on the Random Under Sampling-Light GBM framework is second only to the traditional learning algorithm Logistic regression,and has higher stability than the other two integrated algorithms.In the risk management of each credit institution,it is essential to control the default risk of the borrower,screen more high-quality customers and track and monitor the long-term risk of customers,build a precise and stable risk assessment model and predict the repayment of the borrower,so the research content of this paper provides a certain reference value. |