| The financial statement of a listed company is a comprehensive document that reflects the company’s operating situation in a period of time.China’s law stipulates that listed companies must regularly report to the public.But in recent years,the phenomenon of financial fraud of domestic listed companies has been frequently exposed.In order to protect the interests of listed companies in the face of financial crisis,individual listed companies will choose to whitewash or even commit fraud on financial data in the first time,in order to improve investor confidence or avoid the supervision of relevant departments.This kind of behavior has greatly weakened the public’s confidence in China’s capital market,but the traditional manual audit method for listed companies fraud identification effect is not satisfactory.Therefore,if the annual financial data disclosed by listed companies can be predicted in advance through data mining technology,it can greatly reduce the loss of investors and the impact of the forced delisting of listed companies on the market.This thesis aims to solve such problems by taking the financial data of listed manufacturing companies from 2015 to 2019 as the research object,focusing on the analysis of various data mining models,and ultimately building a Stacking fusion model optimized by genetic algorithm.The model is used to judge whether the listed companies in the manufacturing industry commit fraud in the current year.In order to solve the above problems,the following work is carried out in this thesis:(1)in this thesis,genetic algorithm,SHAP feature analysis and statistical test are used for feature selection.Confirmatory factor analysis is carried out on the selected indicators that can be used to identify financial fraud,so as to achieve the effect of dimensionality reduction and facilitate the interpretation of the model.At the same time,a secondary indicator system is established according to the result of factor analysis.(2)This thesis will make a horizontal comparison of the model recognition effects of the Logistic regression model,SVM model,random forest model,XGBoost model,Light GBM model and BP-neural network model respectively,and build a Stacking fusion model for the three models with better recognition effects.On this basis,GA was used to optimize the Stacking fusion model,and a better recognition effect ga-stacking fusion model was obtained.The empirical results show that:(1)the indicators of financial fraud in listed manufacturing companies mainly focus on indicators related to business development ability and some indicators of assets and liabilities.This kind of financial index fraud is often the most direct reflection of the development of an enterprise,so it has become the first choice for listed manufacturing companies to deceive investors in financial fraud.(2)In the field of financial fraud identification of listed manufacturing companies,the recognition effect of GA-Stacking fusion model is significantly better than that of single machine learning model.The best performance of single model is the BP neural network model and the worst performance is the Logistic model.In addition,the integrated model is superior to other non-integrated models,and the support vector machine model has its special advantages in the identification of fraud samples.(3)This case and GA-Stacking model demonstrate that genetic algorithm has its unique advantages over traditional grid search and random search,and is extremely suitable for optimization of large sample data set model. |