| More than 20 years have passed since the Enron financial fraud incident in the United States.Although China has strengthened the regulation and punishment of capital market order and violations in these years,however,financial fraud incidents in recent years have been on the increase because the huge profits brought by financial fraud are far greater than the costs paid.For example,in recent years,financial fraud cases such as Kangmei Pharmaceutical and Luckin Coffee caused a national stir,which to a large extent damaged the interests of stakeholders,destroyed the good order of socialist market economy,and hindered the healthy development of listed companies in China.Therefore,how to more efficiently identify the financial fraud of listed companies is the current all walks of life has been discussed and urgent to solve the problem.In view of this background,first of all,through the research on financial fraud identification literature,it is found that the current research on the construction of financial fraud identification index system is mostly based on financial and non-financial indicators,lacking the research on the text content.In the construction of financial fraud identification model,most of the scholars choose the single classifier model,while few scholars choose the integrated learning model.Therefore,from 2014 to 2021,listed companies punished for the first time by China Securities Regulatory Commission,Ministry of Finance and Shanghai and Shenzhen Stock Exchanges due to false records,fictitious profits,fictitious assets and material omissions were selected as samples of financial fraud in this study,and non-fraud samples were one-to-one matched as control samples,with a total of 372 research samples.Then,43 financial and non-financial indicators were selected.After significance test,8indicators with no significant difference between the financial and non-financial fraud samples were eliminated,and 11 principal components were extracted from the remaining35 indicators through principal component analysis.In addition,text information of Management Discussion and Analysis(MD&A)in the annual report is introduced.Through text analysis,emotional polarity,emotional tone and text readability are used as text indicators.Finally,based on the Stacking integrated learning algorithm,BP neural network,support vector machine,random forest,and Ada Boost algorithm are selected as the base learner of the model and logistic regression algorithm is selected as the meta-learner of the model to construct the financial fraud recognition model in this study.Through the research,the following three conclusions are drawn:(1)Whether it is a single classifier model built based on BP neural network,support vector machine,random forest,and Ada Boost algorithm,or an integrated learning model built based on the integrated learning algorithm of Stacking,the recognition effect of each model is improved after the introduction of text indicators compared with only considering financial and non-financial indicators.(2)Through comparative analysis of the Stacking integrated learning model and the other four single-classifier models,it is found that the financial fraud identification model constructed based on the Stacking integrated learning algorithm is superior to the other four single-classifier models whether it is based on financial and non-financial indicators or when text indicators are introduced.(3)After adding emotional polarity,emotional tone and text readability indexes in the financial and non-financial indicators respectively,emotional polarity and text readability indexes can effectively improve the recognition performance of the model,and emotional polarity indexes can help improve the accuracy of model recognition and provide more incremental information.This study provides a new model for the efficient identification of financial fraud to a certain extent,and further explores the application of the text in the identification of financial fraud and provides certain experience for scholars committed to this research direction in the future. |