Font Size: a A A

Research On Multimodal Sentiment Analysis

Posted on:2022-12-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:F Y ChenFull Text:PDF
GTID:1488306764460114Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
Human beings constantly express their feelings in everyday communication.Being a combination of intelligence quotient and emotional quotient,human intelligence contains objective reality identification and logical calculations,as well as includes emotion-driven decisions and actions.Therefore,an integration of affective computing and logical computing is an important step for machines towards human intelligence.For the development of emotional AI agents,the research on sentiment analysis has attracted intense interests.Sentiment analysis is the most basic and important aspect of affective computing,which aims to enable machines to discover and understand emotional states of human beings.With the rapid growth of the Internet and the proliferation of multimodal data,how to mine and model the emotional elements in multimodal data has become an important research problem in sentiment analysis.Focusing on the multimodal sentiment analysis task,this dissertation revolves around a key problem,multimodal dynamics,and conducts research on two sub-tasks: Emotion Recognition in Conversation(ERC)and Entity-level Sentiment Analysis(ESA).Specifically,the main contributions of this dissertation are as follows:(1)This dissertation introduces a hierarchical uncertainty modelling method for ERC,proposes a regularisation-based attention module perturbed by source-adaptive noises to model context-level uncertainty,and reformulates Monte Carlo dropout to capsule network to model modality-level uncertainty.Meanwhile,considering the emotion invariance and expression diversity of modalities,this dissertation proposes a framework exploring latent modal equilibrium.The framework contains a weight-sharing triplet structure with conditional layer normalization and a capsule network,so as to explore both emotion invariance and expression diversity among modalities.The proposed method is able to improve prediction accuracy and prediction reliability simultaneously.(2)This dissertation proposes a mechanism to adaptively model multimodal and contextual dynamics in ERC.This dissertation presents a differentiable and end-to-end approach that learns module-wise decisions across modalities and conversation flows simultaneously.In this framework,model learns itself to determine which sub-modules in which modalities to drop due to disturbance,to maintain as modality-specific or to share across modalities.It supports adaptive information sharing pattern and dynamic fusion paths,as well as captures dynamics in both spatial and temporal directions.In addition,two decision learning mechanisms are proposed,and two extra loss functions are developed to capture the modal equilibrium.The proposed framework mitigates the problem of modelling complex multimodal relations while ensuring it is efficiently scalable to the number of modalities.(3)This dissertation focuses on the graph-based ERC task and proposes an adaptive algorithm for multimodal graph fusion,dubbed EGO fusion.The core idea of the proposed EGO fusion is to spread a proper amount of multimodal information and partially ease the existence of noises.EGO fusion adaptively distils edge-wise multimodal information and learns modality-specific fusion patterns.It allows the most essential and informative inter-modal information to spread as well as preserves intra-modal propagation,resulting in more sufficient multimodal processing.EGO fusion can also be applied to other graphbased multimodal tasks and benchmarks as a plug-and-play module.(4)This dissertation conducts research on cross-modal high-order semantic matching problem for the ESA task.This dissertation analyzes the importance of cross-modal semantic correlations,and points out that multimodal data in this task tends to have 'weak correlation' and higher-level semantic correlations such as 'scene correlation' or 'causal correlation'.On this basis,this dissertation proposes to include a cross-modal matching loss to the model,as a supplement to the classification loss,so as to build a framework of multi-task learning.Under supervision of cross-modal matching loss,the model is able to perceive cross-modal high-order semantic correlations.Considering the data particularity in this task,this dissertation proposes a metric loss function for cross-modal matching,which focuses on the relative similarity between positive and negative pairs.The effectiveness of the proposed method is verified by adding cross-modal matching loss function to three existing ESA models.Finally,this dissertation briefly summarizes the aforementioned research contents,and provides inspiration for further expansion of the research and future work.
Keywords/Search Tags:emotion recognition in conversation, multimodal dynamics, entity-level sentiment analysis, contextual dynamics, high-order semantic correlation
PDF Full Text Request
Related items