In recent years,public opinion incidents such as "fake news" and "news reversal" have been frequent,and rumors spread with the help of images have become a new form of rumor spread in the digital era.These incidents are not only a product of the digital era,but also reflect the problem of information credibility that existed in the traditional media era.However,with the widespread use of multimodal data(including images,texts,videos,etc.),rumors are spread more widely and are more harmful.Therefore,the importance of multimodal fusion in rumor detection continues to emerge.Only through the integrated use of multiple data can we identify and combat rumors more accurately and protect the public’s right to know and interests.The core problem of this thesis is how to fuse multimodal information(including text and images)in posts to improve the accuracy of rumor detection.However,existing works have two problems in fusing text and image information: most of the existing works use simple splicing method for data fusion,ignoring the interaction between text and images;when the amount of fused data increases,if a complex fusion network is used for multimodal fusion,it will lead to high dimensionality of the fused data,difficulty in feature selection,and the increase of data types will also bring a large amount of The existing fusion method is difficult to effectively fuse the data of multiple modalities,which affects the performance of rumor detection models.For the above problems,the following solutions are proposed in this thesis:1.In this thesis,a multi-level graphical fusion network is designed to address the problem that the interaction between text and images is easily ignored when using simple stitching for fusion.The network is used to fuse text and image information in social media posts,and introduces three coding mechanisms,namely global encoder,attention mechanism and convolutional neural network,to model the global information of posts,local information assigned with different weights,and contextual information.Through multi-level feature fusion,the model in this thesis can establish a closer relationship between text and images and mine more information from them.The use of multi-level graphical fusion network solves the problem that simple spliced data fusion methods cannot mine the interaction information between text and images,and improves the performance of rumor detection models.2.In this thesis,we propose a method to fuse multiple sources of data in stages to solve the problem of rumor detection in social media posts,in response to the problems of excessive data dimensionality,difficulty in feature selection,and excessive redundant information that can easily occur with the increased amount of fused data.Existing deep learning methods in rumor detection often only consider the analysis of post text content,ignoring the complementary information between the text and the embedded text of images,and do not take into account whether the images are tampered or there are graphic inconsistencies.To solve these problems,this thesis introduces an external knowledge base to supplement the background information in multimedia posts,and extracts text features,external knowledge features,embedded text features of images,frequency domain features and visual features.This thesis adopts a gating mechanism to filter redundant information between multiple sources of data,and adopts a phased fusion approach to first fuse some of the modal information,and then fuse the fused features with other modal data.The experimental results show that the method proposed in this thesis effectively solves the problems of too high dimensionality of fused features and too much redundant information,and reduces the computational effort and improves the performance of rumor detection model.This thesis mainly proposes two data fusion methods for the problem of social media post rumor detection.The first approach focuses on multi-level fusion of two features,visual features and text features of images,aiming to enable full interaction of graphical features and enhance the connection between graphs and texts;the second approach focuses on fusing the original information of social media posts with external knowledge information,using a phased fusion strategy,aiming to achieve efficient fusion of multiple features,reduce the dimensionality and computational effort of fusion,and use a gating mechanism to filter redundant information.Meanwhile,the two fusion methods proposed in this thesis are also applicable to the fusion of frameworks in other domains,such as sentiment recognition and visual Q&A. |