| Fact verification is the task of judging the veracity of statements based on the evidence in the corpus,which can provide support for the identification of false information on the Internet.Since tabular data are widely available in applications such as new media platforms,the study of fact verification for tabular data is of great significance for content security governance in cyberspace.Tabular fact verification involves both closed-domain and open-domain scenarios,where the former means that the evidence related to the statement is directly given and the latter means that the relevant evidence needs to be retrieved from the corpus.However,there are some problems with the existing tabular fact verification studies.Firstly,existing research mostly focuses on the closed-domain scenario.These models only focus on the validation process,facing the problem of insufficient representation ability and the difficulty of combining statements for effective reasoning.Secondly,models in the open-domain scenario need to include both retrieval and validation processes.Nevertheless,the lack of attention to the retrieval process in existing studies makes it difficult to obtain accurate tabular evidence,limiting the accuracy of the model.Addressing these issues,this thesis focuses on the challenges of fact verification in both closeddomain and open-domain scenarios,aiming at providing a complete solution for tabular fact verification.First,this thesis proposes a graph enhanced cross-modal Transformer model for fact verification(GECMT),which is used to enhance the representation and inference ability of the model for tables in the closed domain scenario.Furthermore,a fact verification algorithm based on table retrieval and entity graph reasoning is proposed to improve the claim verification accuracy of models in open domain scenarios.Finally,based on the above two algorithms,this thesis designs and implements a prototype system for tabular fact verification that considers both closed and open domains.The main work of this paper is as follows.(1)To improve the model’s representation learning ability and reasoning ability in the close domain,this thesis proposes the GECMT.It can be divided into a single-modality representation module,an inter-modal interaction module,and a graph-enhanced representation and reasoning module.The single-modality representation module obtains initially encoded representations of tables and texts by using De BERTa pre-trained on large natural language inference datasets.By utilizing a set of cross-modal Transformers,the inter-modal interaction module can obtain claim-aware table representations,as well as table-aware claim representations.The graph-enhanced representation module constructs a graph neural network by dividing the table into different components and establishing connections between the components,which can enhance the representation learning and reasoning capabilities of tabular data.Finally,the verification result of the claim is obtained through a classifier.(2)To address the problem of low verification accuracy of models in open domain scenarios due to the lack of evidence retrieval process,the fact verification algorithm based on table retrieval and entity graph inference is proposed.The algorithm can be divided into a table evidence retrieval module as well as a statement verification module.The table evidence retrieval module calculates the matching score of each table and statement in the corpus to obtain the tables related to the statement.In addition,the statement verification module constructs a graph neural network by extracting the entities in the tables and statements to effectively capture the key information contained in the tables.Besides,it enhances the interaction between the table entities and statements by using attention aggregation and information fusion operations,effectively improving the validation accuracy of the model.(3)This thesis designs and implements a prototype fact-validation system for tabular data,taking into account both closed-domain and open-domain scenarios.The system provides a complete tabular fact validation mechanism for web users,which can perform statement validation not only in the specific case of a given evidence table(i.e.,closed domain)but also in the general case of a given massive corpus only(i.e.,open domain).At the same time,the system provides tabular data extraction,statement validation history review,and visual analysis to assist users in identifying disinformation. |