Font Size: a A A

Research On Multimodal Emotion Recognition In Conversations Based On Deep Learning

Posted on:2022-07-11Degree:MasterType:Thesis
Country:ChinaCandidate:X Q ShiFull Text:PDF
GTID:2558307154976729Subject:Engineering
Abstract/Summary:PDF Full Text Request
Emotion recognition in conversations(ERC)is a hot topic in the field of natural language processing.It classifies the emotion of every utterance in every dialogue by mining the contained emotional information.In recent years,the task has been widely used in many emerging tasks,such as human-computer interaction,etc.Compared with emotion recognition in the text modality,the video modality and voice modality are added as supplementary information in multimodal emotion recognition.The following two points are mainly completed:(1)Based on attention mechanism,we propose a multi-modal fusion method,which fully considers the influence of interactive information between modalities on ERC.Specifically,the feature representations are extracted from the videos,including three modalities of text,voice,and video.We use the multi-head attention to capture the interactive information among these modalities.In this way,all of modalities can receive information from other modalities.In addition,since the text modality has more effective information in the ERC task,this method retains the original textual information and learns supplementary information.The experimental results show that,our method has significant effects on the mainstream databases IEMOCAP and MELD.(2)We also propose an Interactive Multimodal Attention Network(IMAN),which takes into account the influence of long-term context and speaker dependency on emotion classification.Specifically,the network uses a multimodal fusion method to obtain a refined utterance representation,which contains the interactive information among modalities.Then,we construct a conversational modeling network and propose three gated recurrent units(GRUs).They are the context GRU,the speaker GRU and the emotion GRU,respectively.The network uses the context information and speaker dependency to update the current utterance features and realizes the emotions of the utterances in the dialogues.Detailed evaluations on two databases IEMOCAP and MELD demonstrate that ours IMAN outperforms the state-of-the-art approaches.
Keywords/Search Tags:Emotion recognition in conversations, Gated recurrent units, Multimodal, Attention mechanism
PDF Full Text Request
Related items