| In recent years,with the increasing downward pressure of China’s economy,the business conditions of enterprises have gradually deteriorated,and the cost of external financing for enterprises has also increased,leading to frequent loan defaults,which seriously undermine the profitability and sound operation of commercial banks and other financial institutions.Based on this background,this thesis constructs identification model of an enterprise loan default risk by machine learning method,to identify enterprises with loan default possibility and help financial institutions to improve their risk management ability.Firstly,the collected raw data is preprocessed by using recursive feature elimination and logistic regression with L1regularization,to obtain the optimal feature subset.The data is balanced by combining SMOTE oversampling and Tomek Link undersampling.Then the identification model of the enterprise loan default risk is discussed on machine learning methods,thereafter SVM,XGBoost,KNN,and Extra Trees models are used on the processed dataset,with the combine cross-validation and Grid Search to obtain the optimal parameters.To improve the performance of the identification model for enterprise loan default risk,the Stacking model was constructed on the basis of the above four single models.The results show that the Stacking model outperforms the single models in terms of F1,AUC,G-mean and Accuracy,and is more accurate in identifying loan defaulters.Finally,the importance of indicators are verified.Concerning of the indicators selection,most of the studies only select specific financial indicators to study corporate loan default,while the results of this study show that adding indicators,representing corporate governance structure such as ownership structure,executive compensation and audit opinion,can significantly improve the performance of corporate identification models,When using the Extra Trees and XGBoost models for importance feature screening,both models show that the top three features are retained earnings to assets ratio,net income to comprehensive income ratio,and audit opinion type.That is,these three features help to identify potential loan default risk earlier. |