| With the improvement of computer computing power and the huge growth of data,artificial intelligence techniques have developed rapidly and directly contributed to the iterative updating of research models in various fields.Since the AI technology with machine learning as the core can dig deeper into the intrinsic value of data and the hidden potential connections between data,it provides many excellent ideas for solving classification problems and regression problems.Machine learning can be applied both in academic research to refute or corroborate hypotheses,and in practice to discover hidden patterns among large amounts of data.Machine learning has been widely used in many fields,and it is also gaining more and more attention and research in the field of finance,including the use of machine learning methods to identify financial fraud.Fraud identification is a classical two-category unbalanced sample problem.Although the probability of fraud is low,it can have a great negative impact once it occurs.There are few research results that use the original characteristics of financial statements as the entry point,and overseas research Bao(2020)[52]shows that the RUSBoost algorithm can build a better financial fraud identification model based on the original characteristics of financial statements.In view of this,this paper conducts a study to verify the generalization performance of foreign financial statement fraud identification models in Chinese listed companies,which is based on the raw financial statement data of Chinese listed companies from 1998 to 2016,and the results show that the model has poor robustness and can’t get a better identification effect on the financial statement fraud of China’s listed companies.According to the test results,this paper adjusts and optimizes the data set and machine learning methods based on the actual situation of the Chinese market.According to the research results,The results show that it can build a more suitable financial statement fraud identification model for Chinese listed companies with the Stacking model.Collectively,this study has the following contributions:(1)According to the CSMAR database,this paper manually collates the sample data set applicable to the research on the identification of fraud in the financial statements of Chinese listed companies from 1998 to 2016,and verifies that the foreign financial fraud identification model based on Rusboost algorithm cannot achieve a good prediction effect in the Chinese market.(2)the financial statements of the original data as sample variables,set up the application of the model based on Stacking in China’s listed company financial statement fraud problem recognition model,this paper shows that based on the Chinese market the feasibility of using raw data as sample variables,and can achieve good prediction effect and effectively reduce the time cost and complexity of fraud recognition of It provides an important reference for the identification of financial fraud of Chinese listed companies in practice.(3)Based on the experimental results of this paper,grid-search algorithm was further used to test the model potential to explore the influence of sample distribution on the model. |