| In 2014,the "11-Chaori-Bond" opened the prelude to the explosion of default in Chinese capital market.After that,not only private enterprises have defaulted,but large state-owned enterprises and entities with AAA credit ratings have also had default records.As of the first half of 2021,647 credit bonds have defaulted,and the defaulting entities come from all over the entire industry.After the system of "rigid payment" is broken,bond-defaulting becomes an inevitable trend.With the slowdown of global economic growth,continuous trade frictions,escalation of regional conflicts,and global spread of the epidemic in recent years,the frequent occurrence of bond defaults seems to be a sure thing,which will lead to higher risk premium compensation.Bond default has both the micro reasons of the issuers and the macro-factors under the general economic environment.According to the existing literature research,this paper plans to select 26 financial indicators as micro indicators,and then select macro indicators,using the T-1 annual report of the issuer which first defaulted in the Wind database,and exclude those business with incomplete disclosed information to modelling.For the control group,the defaulting enterprises were selected at a ratio of 1:10,and the enterprises in the same industry and in size were not significantly different(with total assets as the standard,the difference was within 10,000)and did not default.Because the default sample size is small,this paper adopts the processing method of ADASYN algorithm for unbalanced sample set to preprocess the samples.At the same time,it introduces 6commonly used default-early-warning models,such as random forest,logistic regression,LightGBM,etc.,and compares the results of XGboost with them.Finally,random search and Bayesian optimization are used to comprehensively adjust the parameters of XGboost,and then fine-tune a single parameter to find the optimal parameter.Also this paper uses the SHAP value to explain the results of the used XGboost model.The empirical results show that the artificially expanded new samples will basically follow the judgment of logic of the original samples,and will also adjust the importance of different indicators.In addition,XGboost has a good prediction effect.After the ADAYSN balanced data set and then random search and parameter adjustment,the AUC value can reach 0.8611,and the prediction effect is better than that of random forest,GBDT,LightGBM,SVM,decision tree,and logistic regression models.The ensemble algorithm has high identification ability and the ability to prevent over-fitting,and can be better used for bond default prediction.According to the results of SHAP visualization output,financial indicators are still important variables in the T-1 year forecast model,which may be because corporate governance is a chronic process,and the transmission of the macro economy takes time,or already visually reflected in financial metrics.The innovations of this paper are:(1)The SHAP value is used to visualize the results,so the XGboost model is no longer a "black box" in the traditional sense,which is beneficial for model users to observe various variables and prediction results more intuitively;(2)In terms of indicators,most financial distress prediction literatures focus solely on the micro-level or macro-level.This paper combines macro and micro and corporate governance indicators for modeling to consider defaults from a more comprehensive perspective.(3)In terms of research methods,this paper adopts the popular boosting algorithm and traditional forecasting model and compares different models to predict bond defaults more directly and objectively. |