| As an important component of measuring the overall performance of a corporation,effective prediction of its trend is conducive to the corporation’s future business decisions.The current market competition is becoming increasingly fierce,the international financial market turbulence is intensifying under the impact of the epidemic,and China’s listed corporations are facing unprecedented risks and challenges under the open environment.In this context,the establishment of a good financial performance forecasting mechanism will help enterprises to regulate and manage their own operational risks,which is of great significance to reduce systematic risks and maintain the stability and healthy development of the financial market.Based on the financial indicator data of listed corporations,this article discusses the feasibility and scientificity of financial and non-financial component multi-source data for predicting corporation financial performance through theoretical analysis.Firstly,a financial performance prediction indicator system including financial indicators,macroeconomic indicators,ESG indicators,and financial text indicators is constructed through quantitative analysis.Secondly,five machine learning models,Logistic,ANN,SVM,XGBoost,and LightGBM,are used to predict the financial performance of listed corporations in China.The prediction effects of the models under financial indicator data and multi-source indicator data are compared and analyzed,and the optimal financial prediction model is selected.Thirdly,this article uses the SHAP method to analyze the decision-making process of the prediction model,explore the importance of each characteristic variable to the prediction results of the model,and analyze the relationship between each characteristic indicator and financial performance.Finally,based on the perspective of robustness testing,this paper uses random over-ampling and under-sampling methods to balance company data as a stability test for the model.The empirical results show that compared with only the financial features data,the forecasting accuracy of the model constructed in this paper is improved under the multi-source features data,among which the LightGBM model obtains the best prediction accuracy in the the two datasets.The forecasting AUC value under the financial features data was 0.8886,while the AUC value predicted under the multi-source features data was 0.9004,which increased by 1.18 percentage points,and the robustness test results showed that the model had good robustness and reliability.The interpretable results of the LightGBM model show that there are differences in the importance of features in models,but the feature importance of the models on the two datasets is also similar,and it is found that non-financial indicator can provide additional useful information for financial performance forecasting. |