Font Size: a A A

Multimodal Sentiment Analysis For Text,Audio And Video

Posted on:2022-12-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2518306746981309Subject:Computer Software and Application of Computer
Abstract/Summary:PDF Full Text Request
Sentiment analysis is the key technology of human-computer interaction,which plays an important role in improving and enhancing the human-computer interaction environment in the information age.Using multimodal data can provide more comprehensive and rich emotional information,helping the model to capture the hidden emotions in the utterance.Constructing sentiment analysis for multimodal data inevitably involves intramodal representation learning and fusion interaction between modalities,which are the two main challenges that affect the performance of multimodal sentiment analysis models.Based on these two challenges,this paper firstly conducts research on the modeling of text + speech bimodal,and then extends the threemodal sentiment analysis model of text + speech + video.The main research contents of this paper can be summarized as the following highlights:(1)In order to achieve more effective inter-modal interaction,this paper proposes a multi-modal sentiment analysis model-BMAM based on bidirectional mask attention mechanism.The model models two modalities of text and speech at the same time.For each modality,mask attention dynamically adjusts the attention weight of the current modality by introducing information from another modality,so as to obtain a more accurate model.state representation.These modal representations not only retain the emotion of the modality itself,but also incorporate the emotion information of another modality to help the model achieve the best emotional decision-making.The model is evaluated and verified on the general multimodal sentiment analysis dataset IEMOCAP.The weighted accuracy rate of sentiment analysis of the model reaches 74.1%,which is significantly improved compared with the existing mainstream methods,which shows that the BMAM model has a good performance among multimodalities.Superior modeling performance on interactions.(2)In order to learn a more comprehensive and rich modal representation,this paper proposes a three-modal sentiment analysis model-PSDA based on private sharing decoupling and adaptation.PSDA learns 2 different representations for each modality:a shared representation and a private representation;among them,the shared representation is abstracted as the intention and purpose expressed by the multi-modal sequence,representing the consistency between multiple modalities;the private representation is abstracted as the modality The unique discriminative characteristics of the mode itself,such as the tone of the voice,the style of the text,and the color of the video,represent the difference of the mode.By learning the consistency between modalities and the dissimilarity between modalities,a diverse view of multimodal sequences is provided.PSDA evaluates the performance of sentiment analysis tasks on CMU-MOSI and CMU-MOSEI data.Compared with the comparison model,PSDA has an improvement of 1.3%-6.2% in the 7 classification accuracy.The performance evaluation of emotion recognition task in IEMOCAP shows that PSDA greatly improves the results obtained by BMAM in bimodal analysis.
Keywords/Search Tags:Multimodal sentiment analysis, Multimodal interaction, Attention mechanism, Multimodal representation learning
PDF Full Text Request
Related items