| Corporate bankruptcy has great negative impact on investors,financial institutions,industry and even the market.Facing the more uncertain economic environment under the COVID-19,an effective corporate bankruptcy risk assessment model is of great practical significance to corporate stakeholders.This thesis uses three machine learning classification algorithms including Logistic Regression,Random Forest and Light GBM to build the corporate bankruptcy risk assessment model,and uses the data of listed companies in Taiwan for experimental analysis.Borderline-SMOTE algorithm is used to deal with imbalanced data while feature selection technology is used to reduce data dimension.The thesis matches different filter and embedded feature selection methods for Logistic Regression and Random Forest respectively,investigates whether feature selection can improve the performance of classification model,and find the best combination of feature selection and classification model in this process.Light GBM algorithm does not combine with feature selection methods,as it has exclusive feature bundling technique.The experimental results show that LASSO has overall improvement on Logistic Regression model while Random Forest feature selection increases its own F2-measure.The combination which has the highest recall ability of bankrupt companies is LASSO-Logistic Regression,with the recall rate of 89% and AUC value of 0.8834.However,the precision of this combination in predicting bankrupt companies is relatively low,which lowers its F2-measure and increases survey cost in practice.The recall rate of Light GBM on bankrupt companies is about 82%,which is slightly inferior to LASSO-Logistic Regression.However,Light GBM improves the precision in predicting bankrupt companies,so that its total F2-measure reaches 0.9296.Besides,the AUC value of Light GBM is 0.8711,which also indicates a good overall performance.The thesis constructs a fusion model based on LASSO-Logistic Regression,Random Forest with its embedded feature selection and Light GBM via Softing Voting technique.The fusion model maintains a high recall rate of 89% on bankrupt companies,and increases the AUC value to 0.8938.All indicators of the fusion model are better than LASSO-Logistic Regression,but its F2-measure is lower than Light GBM.In conclusion,if stakeholders prefer the highest AUC value or highest recall rate on bankrupt companies,the fusion model is the best choice.If stakeholders prefer a balance between recall and precision on bankrupt companies,and then balance the risk and misjudgment cost,Light GBM has more advantages. |