Font Size: a A A

Research On Identification Of Financial Fraud By Integrating Latent Semantic Features Of Annual Report Text With Accounting Indicators

Posted on:2024-09-22Degree:MasterType:Thesis
Country:ChinaCandidate:C C ChengFull Text:PDF
GTID:2569307052471584Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
In recent years,there have been numerous cases of financial fraud by domestic and international listed companies,such as Xin Jiang Ready,Kingenta ’s "¥20 billion in inflated revenue",Colin’s "fancy shell",and US companies such as World Telecom and Wells Fargo being involved in fraud and on the verge of bankruptcy.Fraud by listed companies not only leads to a decline in the credibility of information disclosed by the market,but also seriously damages the legitimate rights and interests of investors,brings wealth loss to society as a whole,and hinders the sustainable and healthy development of the capital market.Therefore,how to identify and prevent corporate financial fraud is a hot issue of great concern to the industry and academia.In the context of diversified business of listed companies,the increasing complexity of accounting has led to more hidden frauds and more sophisticated manipulation methods,and it is difficult to detect and prevent frauds comprehensively with quantitative financial information only.This paper first uses the annual reports issued by listed companies from 2001 to 2020 as the textual data source,extracts the latent semantic meaning contained in the text of annual reports through LDA thematic modelling,constructs an econometric model between fraud manipulation and latent semantic features of annual reports,and verifies the relationship between fraud manipulation and latent semantic features of annual reports;then integrates accounting indicators with latent semantic features of annual reports and textual language features to form a new feature index.Finally,the Stacking integrated learning algorithm model,which combines linear model and tree-based model,single classifier and combined classifier,is constructed and compared to analyse the recognition effect of each model.The accuracy of the Stacking-based classification model is higher than that of other models.The results of the study show that:(1)When companies engage in financial fraud,they will strategically manipulate the text,which is manifested in the following ways:the descriptions of risk-related are basically equivalent but the descriptions of idiosyncratic risks are relatively reduced and the descriptions of non-idiosyncratic risks are relatively increased.(2)The latent semantics of the annual report can provide more information than the linguistic features of the text when identifying financial fraud.(3)The latent semantic features embedded in annual report text provide more incremental information than MD&A text.(4)The Stacking integrated learning model is significantly more effective than other classifiers.
Keywords/Search Tags:Financial fraud, Text mining, Topic extraction, LDA, Stacking
PDF Full Text Request
Related items