In order to provide a certain decision-making basis for Chinese manufacturing enterprises in dealing with and preventing risks,this thesis takes Chinese A-share manufacturing listed companies as research samples,and uses the average value of the financial characteristic data of the listed companies in 2019(year T-3)and 2020(year T-2)to predict the financial risk status of the listed companies in 2022(year T),so as to explore the optimal financial early warning model for the listed companies in the manufacturing industry.Firstly,in the process of data preprocessing,detection and processing for missing values,outliers and duplicate values are completed in turn.For outliers,isolated forest anomaly detection algorithm in machine learning is used in this thesis,and the results show that the algorithm has a good effect.In the process of exploratory data analysis,the continuous characteristic variables,discrete target variables and their relationships are analyzed and visualized,so as to have a deep understanding of the data.Secondly,for comparative analysis,in this thesis,Borderline-SMOTE oversampling and SMOTE-ENN synthetically sampling algorithms are used to balance the pre-treated data sets,and then two balanced data sets are obtained.Then,three methods are used for feature screening for the two balanced data sets,and after that the principal component dimension reduction method are used respectively.Finally,16 principal components of each group are obtained respectively,and then they are used as the characteristic variables to establish the financial early warning model in the next step.After that,based on the 16 principal components of each group and their corresponding target variables,a single model and an integrated model of financial early warning are established by machine learning model respectively,and their early warning effects are compared horizontally and vertically.The horizontal results show that compared with Borderline-SMOTE oversampling method,when using SMOTE-ENN comprehensive sampling method for data imbalance processing,the financial early warning model has better classification effect.The longitudinal comparison shows that on the Borderline-SMOTE oversampling dataset,the Light GBM model has the best classification effect.On the data set based on SMOTE-ENN comprehensive sampling,the most effective model for classification is BP neural network model.Finally,on the basis of the single financial early warning model and the integrated financial early warning model,the fusion models of Stacking financial early warning are established respectively,and after comparative analysis,the optimal financial early warning model in this thesis is obtained,namely,the fusion model of Stacking financial early warning under the SMOTE-ENN comprehensive sampling method.The accuracy rate is 96.86%,the F1 value is97.32% and the AUC is 98.98%.After comparison with the optimal model obtained in similar studies,it is verified that the optimal model obtained in this thesis is effective,and its generalization ability is strong,no overfitting phenomenon occurs,and the model is relatively stable. |