Font Size: a A A

The Research On Multimodal Sentiment Analysis Based On Deep Neural Network

Posted on:2024-08-21Degree:MasterType:Thesis
Country:ChinaCandidate:J W GuoFull Text:PDF
GTID:2568307103474504Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Nowadays,the development of artificial intelligence has entered a brand new stage.People begin to expect that existing intelligent systems can interact in a friendly,vivid,natural and harmonious way like humans.To this end,a new field of computer science,emotional computing,has been proposed by researchers,whose core idea is to make computers capable of recognizing and expressing emotions like humans,thus making human-computer interaction more natural.In everyday life scenarios,humans express emotions or emphasize specific points of view mainly through their voices,facial expressions,and the content they describe.This kind of emotional expression involves not only verbal information but also non-verbal behavioral information such as visual and auditory information that occurs along with verbal information and becomes one of the important ways of emotional expression.In order to accurately identify the emotions that humans want to express,it is necessary to use multimodal data(mainly including visual,auditory,and verbal modalities)to conduct corresponding research work on emotion analysis,so as to empower computers to perceive human emotions.This paper focuses on sentiment analysis based on multimodal fusion,aiming to address the problems of inadequate cross-modal fusion,heterogeneity of multimodal data sources,and limitations of modal information representation.The main work includes the following three points:(1)Aiming at the problem of simple and insufficient interaction of multimodal data in the process of cross-modal interaction,this paper proposes a multimodal fusion model based on a multi-perspective graph attention mechanism,which models the complex multimodal data introduced on a non-Euclidean structure data structure,and simulates the potentially complex multi-relational interaction between multimodal information with the powerful expressiveness of the graph structure model.Multimodal data from different perspectives are transformed into multimodal interaction graphs with heterogeneous nodes,and full interaction between different modalities is accomplished in the multimodal interaction graphs to unleash the full expressive power of multimodal interactions.The experiments are conducted on a publicly available multimodal sentiment analysis dataset,and the obtained results validate the effectiveness of the proposed multimodal fusion model based on the attention mechanism of the multi-perspective graph in this paper by achieving better results in the sentiment analysis task compared with previous models of multimodal fusion algorithms.(2)Aiming at the problem of unaligned heterogeneous multimodal data sources,this paper proposes a Cross Hyper-modality Fusion Network based on unaligned multimodal information.By using the original unaligned multimodal data sources,the cross hyper-modality interaction between the language modality and the behavioral information accompanying the language modality is accomplished without a prealignment operation.In this cross hyper-modality interaction,the non-verbal behavioral information accompanying the language modality is used to dynamically adjust the position of the words in the semantic space,thus clearly expressing the real emotional state that the subject wants to convey in the different non-verbal behavioral information environments.It is worth noting that the multimodal interaction existing cross hyper-modality fusion network is a direct interaction between multiple modalities,which improves the efficiency of multimodal fusion.(3)Aiming at the problem of restricted modal information representation in the modal fusion process,this paper proposes a one-way bimodal fusion network model.Unlike previous approaches that use a joint representation to represent multimodal modal information projected in a joint space,the unidirectional bimodal fusion network first creates an independent representation for each modality and then uses a cross-modal attention mechanism to achieve fusion between verbal-visual and verbalauditory modalities,thus establishing the dominance of textual modality in the sentiment analysis task.In summary,the three multimodal sentiment analysis research methods proposed in this paper can effectively solve the problems of insufficient cross-modal fusion and restricted multimodal data source heterogeneity in terms of modal representation in the multimodal fusion process.These research methods provide new ideas and techniques for the field of multimodal sentiment analysis.
Keywords/Search Tags:Emotion Recognition, Multimodal Sentiment Analysis, Multimodal Fusion, Cross-Modality Interaction, Modality Representation
PDF Full Text Request
Related items