Research On Emotion Recognition Based On Multi-modal Feature Fusion

Posted on:2020-12-06

Degree:Master

Type:Thesis

Country:China

Candidate:Y Q Feng

Full Text:PDF

GTID:2438330578977076

Subject:Education Technology

Abstract/Summary:

PDF Full Text Request

Emotion recognition plays an important role in human-computer interaction.Generally speaking,people’s emotions are mainly expressed through facial expressions,gestures and verbal expressions.As one of the most important channels for human beings to express themselves,speech,which can effectively express emotions,has been successfully used in automatic recognition of emotions.However,speech is only a way of emotional expression and does not contain all the emotional information.Text messages can also convey the feelings of the speaker.Therefore,emotion recognition based on multi-modal feature fusion is an important research direction.The research objective of this study is to improve the accuracy of emotion recognition by using the method of speech and text feature fusion.Based on this objective,the following experiments were designed:first,the speech data were preprocessed,the low-level acoustic features were extracted,and various statistical functions were applied on the low-level acoustic features to construct the global acoustic features,which were then used for speech emotion recognition.The speech training recognition model is used as the baseline system and the subsequent recognition model is compared.Secondly,text sentences are preprocessed to extract different features,and a total of three types of features are generated,namely,word bag features,word vectors and sentence vectors,for text emotion recognition.The text features with the highest recognition accuracy are selected for subsequent fusion with speech features.Finally,the speech and the best performing text features are fused for emotion recognition,and their performance on the IEMOCAP dataset is compared.In feature fusion,two feature fusion methods are used,namely feature layer fusion and decision layer fusion.Finally,this study compared the emotion recognition results after the fusion of speech and text features with the recognition results of the single speech channel,and compared the influence of the fusion mode on the recognition results.The experimental results show that the emotion recognition model trained by the fusion of speech and text features achieves better recognition effect and higher recognition accuracy than the emotion recognition model trained by single modal features.Specifically,the recognition rate of the emotion recognition model after the fusion of speech and text features is higher than that of the speech emotion recognition model,and is also higher than that of the text emotion recognition model.Secondly,decision level fusion performs better in emotion recognition than feature level fusion.The recognition rate of the speech and text emotion recognition model fused by the decision layer is higher than that of the speech and text emotion recognition model fused by the feature layer.In general,compared with single-mode speech emotion recognition or single-mode text emotion recognition,multi-mode feature fusion can effectively improve the accuracy of emotion recognition.

Keywords/Search Tags:

speech emotion recognition, Text emotion recognition, Feature layer fusion, Decision level fusion

PDF Full Text Request

Related items

1	Research On Feature Extraction Algorithm Of IMFE And Fusion KELM Recognition Algorithm For Speech Emotion Recognition
2	The Research Of Speech Emotion Recognition Based On The Fusion Features
3	Research On Speech Emotion Recognition Method Based On Multi-feature Fusion
4	Research On Feature Fusion Method Of Speech Emotion Recognition Based On Deep Learning
5	Research On Speech Emotion Recognition Based On Multimodal Information Fusion
6	Research And Implementation Of Speech Emotion Recognition Algorithm Based On Fusion
7	Research On Speech Emotion Recognition Based On Feature And Decision Fusion
8	Research On Emotion Recognition Of Monomodal Speech And Multimodal Speech Vision Based On Transfer Learning
9	Research On Video Non-Contact Emotion Recognition Technology Based On Dual-Modality
10	Research And Application Of Speech Emotion Recognition Technology Based On Feature Fusion