Emotion Recognition Based On Multimodal Feature Disentanglement Learning

Posted on:2024-05-28

Degree:Master

Type:Thesis

Country:China

Candidate:D Y Dong

Full Text:PDF

GTID:2568307097957069

Subject:Pattern Recognition and Intelligent Systems

Abstract/Summary:

PDF Full Text Request

Emotion recognition typically utilizes multiple information sources such as physiological signals and behavioral features to infer different emotional categories.Multi-modal emotion recognition techniques based on audio and video have received widespread attention due to their robustness.Currently,most research methods have not fully considered the temporal characteristics of modalities and the complementary nature of modality information,which makes it difficult to efficiently integrate different modal features.Additionally,the diversity of the identity of the recognition subjects has introduced many interference factors to the model learning process,making it difficult to achieve significant improvements in accuracy.To overcome these obstacles,the main efforts in this research are:(1)This paper proposes a multi-modal emotion recognition method that combines attention mechanism with a dual-sequence LSTM network.Various types of attention mechanisms are added to to better capture relevant information for audio-visual emotion recognition.Firstly,for the video module,a highly efficient ResNeXt50 network is combined with a coordinated attention mechanism to capture the position information and long-term spatial dependencies of the video image sequence.For the audio module,a one-dimensional CNN with self-attention mechanism is added to learn semantic features.Secondly,the features of the two modalities are separately processed by an embedded dual sequence LSTM network with self-attention mechanism,and the fused representation is obtained to generate the final emotional output.The self-attention mechanism and the dual sequence LSTM network ensure the complementarity and completeness of the modality features.Different feature extraction networks are combined with different attention mechanisms to exhibit their optimal features and achieve efficient model expression.Through comparative experiments on two datasets,RAVDESS and eNTERFACE’05,as well as ablation experiments,it has been verified that the proposed algorithm is capable of accurately processing temporal and complementary information and reducing redundant information in the fused features.(1)This paper proposes a multi-task feature space decoupling based audio-visual emotion recognition method,which reduces the influence of identity-related representations on emotion classification by decoupling them from audio-visual features.First,the emotional and identity-related encoders are used to map the fused audio-visual features into different task-specific hidden spaces.Then,a multi-task training method is adopted to learn emotional and identity-related latent representations,and an emotion-identity coupling loss function is introduced to measure the classification loss in emotion and identity recognition tasks.The weights of each task are dynamically updated in an adaptive manner to guide model parameter learning and improve classification accuracy.Experiments on the RAVDESS and eNTERFACE’05 datasets,as well as ablation experiments and feature visualization experiments,verify that the multi-task feature space decoupling based audio-visual emotion recognition method can enhance emotion recognition accuracy by weakening the coupling between emotional and identity-related features.

Keywords/Search Tags:

emotion recognition, multimodal, attention mechanism, disentangled represention learning

PDF Full Text Request

Related items

1	Multimodal Emotion Recognition Algorithm Based On Deep Learning
2	Research On Multimodal Emotion Recognition In Conversations Based On Deep Learning
3	Based On Multimodal Feature Emotion Recognition Research
4	The Study Of Multimodal Emotion Recognition Based On Text,Speech And Video
5	Research On Multi-modal Emotion Recognition Based On UDP-MIF
6	Research And Implementation Of Emotion Recognition Technology Based On Multimodal Fusion
7	Research On Emotion Recognition Based On Multimodal Deep Learning
8	Multimodal Emotion Recognition Based On Audio And Video
9	Research On Emotion Recognition Based On Speech And Facial Expression
10	A Study Of Deep Learning Based Multimodal Emotion Recognition