Multi-level Modality Representation Fusion For Emotion Analysis

Posted on:2021-07-22

Degree:Master

Type:Thesis

Country:China

Candidate:J Y Zou

Full Text:PDF

GTID:2518306461970599

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Emotion recognition is an emerging interdisciplinary research field and one of the key technologies that enables machines to imitate humans.At present,in the natural language processing field,through deep learning and transfer learning techniques,the text modal has made great progress on emotion recognition task.However,with the development of social media technology,for example,software platforms such as Tik Tok,Kwai,Bilibili,etc.,the data for emotion recognition has gradually changed from a single text modal to a multi-modal including text,audio,and video data.The combination of multi-modal will carry more information,which makes up for the shortcomings of using only incomplete text modal information to hinder the decision-making process of emotion recognition.In addition,multi-modal emotion recognition has great potential applications in healthcare systems(as a tool of psychological analysis),human-computer interaction(accurately grasping user needs),etc.Currently,there are five key difficulties in the multimodal emotion recognition field:Representation,Translation,Alignment,Fusion and Collaborative learning.This article focuses on the two modalities of text and audio,and conducts in-depth exploration from the three challenges of Representation,Fusion and Collaborative learning.Firstly,extracting the effective representation features from single modal,and on this basis,using the methods that can fully integrate complementary information between different modalities to unify the multi-modal representations into the same vector space,and finally it solves the practical problem of mode missing which needs collaborative learning.In this paper,we propose a multi-level and multi-feature audio representation extraction method that combines feature engineering and recurrent neural network.The fusion strategy of auxiliary modal supervision training and the generative multi-task network solve the above three challenges respectively.It has achieved excellent emotion recognition effects on international open multimodal datasets such as IEMOCAP and MELD.The experimental results show the effectiveness of the work in this paper,and provide a reference and method basis for the artificial intelligence multimodal emotion recognition field.

Keywords/Search Tags:

Computer Neural Network, Emotion Recognition, Multi-Modal, Feature Extraction, Multi-Modal Fusion

PDF Full Text Request

Related items

1	Gesture Recognition Based On Multi-modal Fusion Of RGB-D Images
2	Research Of Emotion Recognition Based On Multi-modal Fusion
3	Research On Speech Emotion Recognition Method Based On Multi-feature And Multi-modal Fusion
4	Emotion Recognition Based On Multi-modal Information Fusion
5	Research On Emotion Recognition Of Monomodal Speech And Multimodal Speech Vision Based On Transfer Learning
6	Research On Multi-Modal Emotion Recognition Based On Deep Learning And Feature Fusion
7	Research On Multi-modal Biometric Identification Method Based On Convolutional Neural Network
8	Research On Multi-modal Emotion Recognition Based On Broad Learning System
9	Research On Emotion Recognition Based On Multi-modal Feature Fusion
10	The Research Of Multi-Modal Fusing Based Emotion Recognition