Study On Multimodal Emotional Irony Detection Method Based On Image And Text Fusion

Posted on:2024-04-24

Degree:Master

Type:Thesis

Country:China

Candidate:S Li

Full Text:PDF

GTID:2568307058971819

Subject:Electronic information

Abstract/Summary:

Sentiment is the subjective emotional experience that humans have towards things,events,and individuals.Irony is a kind of implicit emotional information conveyed,which uses ironic and contradictory language to reflect people’s feelings in a certain scene.Irony detection is the analysis of people’s true sentiment through various technical means.Irony detection is not only an important research content in the field of sentiment analysis,but also one of the hot issues in natural language processing research.Early research on irony detection mostly used text-based analysis methods.In recent years,with the rapid development of new media technologies,more and more netizens prefer to use multimodal information such as images,text,and videos to express their emotions.Multimodal irony detection through images and text uses contradictions and conflicts between image and text information to achieve the ironic effect.Therefore,the fusion of image information into multimodal irony detection task is a new trend.In this field Previous multimodal fusion methods based on feature fusion and decision fusion,neglected the interaction between multimodal data,leading to missing data during the fusion process;at the same time,it was difficult to identify information conflicts within each modality and inconsistencies between modalities,resulting in low prediction accuracy in multimodal irony recognition tasks.Regarding the above-mentioned problems,the main work and contributions of this article are as follows:(1)Using Adjective-Noun Pairs(ANPs)to represent the semantic information of images.In order to leverage the role of images in multimodal irony detection,adopting Visual Sentiment Ontology(VSO)and Deep Senti Bank detector,which are based on image deep learning development,to detect the semantic information of images,represented as ANPs,and input them into the network model as image attribute features.ANPs can not only more accurately describe the attributes and features of images,but also easily handle complex semantic information,improving the limitations of past single-entity features and enhancing irony detection performance.(2)The Share and Compare Fusion Network(SCFNet)model is proposed.In the SCFNet model structure,the shared fusion network maps the text feature vector and the image feature vector pairwise,allowing different modal features to complement each other,thereby obtaining the interaction information between these two modalities and making up for the missing data during the fusion process.The contrastive fusion network uses an interactive attention matrix to fuse text features and image attribute features,and uses a nonlinear function to perceive the contrast relationship between images and text,capturing the ironic features between different modalities and providing inference ability and semantic interpretation for multimodal irony detection.The accuracy and F1 score of the SCFNet model on the multimodal irony dataset have improved respectively 1.57% and 1.42%.(3)Proposing a Cross Modal Attention Fusion Network(CMAFNet)model,which mainly consists of self-attention fusion network and cross-attention fusion network.The self-attention fusion network analyzes the relationship within the text and image modalities,embeds the attention mechanism into the Transformer to explore the inconsistency within the modality,and uses convolutional neural networks and pooling operations to aggregate these features to obtain global features.The cross-attention fusion network models the relationship between image region features and text modality features,analyzes the interaction between modalities through cross-modal fusion,captures the inconsistency between image and text modality information,and achieves irony recognition.Finally,two embeddings are obtained for the given image and text to analyze the match between image and text information.The performance of the CMAFNet model on the irony dataset is better than that of the baseline model.Experimental verification on a publicly available multimodal irony dataset shows the effectiveness of the two proposed multimodal irony detection methods based on imagetext fusion in this paper.These methods have important guiding significance and wide application value for multimodal sentiment irony detection tasks.

Keywords/Search Tags:

Multimodal, Irony Detection, Adjective-Noun Pairs, Share and Compare, Fusion Network, Self-Attention, Cross-Attention

Related items

1	Research On Key Technologies Of Multimodal Emotion Recognition Based On Speech Signals
2	Multimodal Sentiment Analysis Based On Attention And Fusion
3	Research On False Information Detection Based On Multimodal Event Memory Networ
4	Research On Multimodal Deep Learning Algorithm Based On Attention Mechanism
5	Cross-platform Fusion Of Multimodal Features Design And Implementation Of User Alignment System
6	Research On 3D Object Detection Based On Label Guidance
7	Research On Multimodal Sentiment Analysis Via Hierarchical Cross-Modal Attention
8	Discriminative Emotional Detection For Multimodal High-Level Semantics
9	Research Of Video Temporal Activity Retrieval Based On Attention Networks
10	Research On Fake News Detection Algorithm Based On Multimodal Fusion