The development of social media has made it easier to generate and disseminate news.The current form of news is no longer limited to text,but is composed of multiple modalities such as text,images,and videos.How to judge the authenticity of news,using only single-modal text information for fake news detection can no longer be fast.Accurate implementation.In addition,the current fake news detection of text mainly embeds text features and trains the model,and cannot extract high-level contextual semantics.The existing multimodal fusion detection algorithms are only simple feature splicing for multimodal news,and do not make good use of the complementarity and fusion between multimodal context semantics.In order to solve these problems,this paper proposes a multi-modal fusion fake news detection architecture for graphic and text scenes.The main researches are as follows:(1)In order to better obtain the high-level contextual semantics of news for fake news detection,the BERT pre-training model is used for text news to use [CLS]sentence vector representation and the last layer of word vector average representation to obtain the high-level context of the text.Semantic vector.For image news,the VGG19 pre-training model is used to obtain the high-order context semantic vector of the image by using the output vector of the last layer and the feature map vector of the middle layer.Through experiments,the two pre-trained models have better detection results of fake news based on high-order context semantics,which provides a premise for subsequent multimodal high-order context semantic fusion research.(2)A multi-modal fake news detection framework based on image-text fusion is designed.The framework includes three ways of integrating high-level context semantics,namely vector stitching,attention mechanism-based,and Transformer-based Encoder structure.Specifically,the method of vector splicing is to splicing the high-level contextual semantic feature vectors extracted by the above pre-training model to generate fusion features;the method of attention mechanism is to first assign different weights to the extracted high-level contextual semantic feature vectors.and then fuse to generate fusion features;the way of using Transformer’s Encoder structure is to use the self-attention mechanism to fuse the extracted high-order context semantic feature vectors to generate fusion features.Finally,the learned fusion features are input into the fully connected layer for fake news learning and detection.(3)Multiple sets of ablation experiments are designed,and the validity of the fake news detection framework is verified by using the contextual semantic relevance as the detection standard.Among them,the fusion feature obtained by the fusion of Transformer’s Encoder structure method is trained and learned,and the prediction accuracy rate on the Weibo data set reaches 0.884,and the effect is the best.Several groups of detection algorithms based on the BERT model have achieved good results,indicating that the BERT model is unique in extracting high-level contextual semantic features.Figure [26] table [5] reference [81]... |