| Emotion recognition in conversations(ERC)is a hot topic in the field of natural language processing.It classifies the emotion of every utterance in every dialogue by mining the contained emotional information.In recent years,the task has been widely used in many emerging tasks,such as human-computer interaction,etc.Compared with emotion recognition in the text modality,the video modality and voice modality are added as supplementary information in multimodal emotion recognition.The following two points are mainly completed:(1)Based on attention mechanism,we propose a multi-modal fusion method,which fully considers the influence of interactive information between modalities on ERC.Specifically,the feature representations are extracted from the videos,including three modalities of text,voice,and video.We use the multi-head attention to capture the interactive information among these modalities.In this way,all of modalities can receive information from other modalities.In addition,since the text modality has more effective information in the ERC task,this method retains the original textual information and learns supplementary information.The experimental results show that,our method has significant effects on the mainstream databases IEMOCAP and MELD.(2)We also propose an Interactive Multimodal Attention Network(IMAN),which takes into account the influence of long-term context and speaker dependency on emotion classification.Specifically,the network uses a multimodal fusion method to obtain a refined utterance representation,which contains the interactive information among modalities.Then,we construct a conversational modeling network and propose three gated recurrent units(GRUs).They are the context GRU,the speaker GRU and the emotion GRU,respectively.The network uses the context information and speaker dependency to update the current utterance features and realizes the emotions of the utterances in the dialogues.Detailed evaluations on two databases IEMOCAP and MELD demonstrate that ours IMAN outperforms the state-of-the-art approaches. |