Font Size: a A A

Research On Multimodal Sentiment Analysis Based On Joint Learning Of Image-text Features

Posted on:2023-06-06Degree:MasterType:Thesis
Country:ChinaCandidate:M YuanFull Text:PDF
GTID:2558307097979169Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of the Internet and the intelligence of mobile devices,people are more and more willing to express their opinions on the Internet,and opinions have evolved from a single text to complex multimodality.Since multimodality opinions vividly reflect users’ emotional experiences,the effective utilization of multimodal can improve the performance of sentiment classification.Based on image-text features,for coarse-grained document-level and fine-grained target-level sentiment tasks,this paper extracts multiple modality features and performs cross-modal joint learning by deep learning technology,which achieves multimodal sentiment analysis at different granularities.The main content of this paper includes:(1)Since the existing image-text sentiment analysis models do not deeply mine the sentiment information in images and ignore the influence of cross-modality correlation on the relationship between modalities,this paper proposes an Attentive and Adaptive Fusion Network named AAFNet.First,AAFNet designs a Cross-modality Common Sentiment Extraction Module,which treats self-enhanced text sequence features that learn inter-word relationships in text sequences as context,and uses image-aligned text sequences for acquiring visual-enhanced text sequence features.The context guides the visual-enhanced text sequence features to capture the latent emotional information in images and extracts the common sentiment features of images and texts.Secondly,this paper constructs a Multimodal Adaptive Fusion Module to explore the impact of image-text correlation on the inter-modality relationship.The gate function dynamically adjusts the relationship between modality representations.The self-attention understands the overall sentiment of review to achieve adaptive fusion for accurate multimodal document representation.Finally,the experimental results on the Yelp dataset validate the effectiveness of AAFNet on the multimodal document sentiment classification task.(2)Aiming at the problem that the existing multimodal target sentiment analysis models do not fully utilize target and ignore that multiple targets in the same sentence have different contexts,this paper proposes a Relation Aware Cross-modal Masked Attention Model,named RACMA.First,RACMA designs the Relation Aware Context Extraction Module,which introduces target-rooted dependencies to enhance the discrimination of different targets in the same sentence.It considers target-enhanced sentence representations as target-extended contexts,and adds a new mask matrix to guide the image region features to capture information closely related to the target for the target-enhanced high-level context.Second,to alleviate the difficulty of extracting effective information from short target words interacting with images,this paper designs a Cross-modality Masked Attention Module,which uses target-enhanced image region representation to dynamically adjust the weights between words in target-related image caption features for the target-related low-level context.Finally,experimental results on Twitter-15 and Twitter-17 datasets validate the effectiveness of RACMA on the multimodal target sentiment classification task.
Keywords/Search Tags:Multimodal sentiment analysis, attention mechanism, pre-training language model, graph attention network
PDF Full Text Request
Related items