Font Size: a A A

Bimodal Emotion Recognition Based On Deep Learning

Posted on:2019-08-06Degree:MasterType:Thesis
Country:ChinaCandidate:L YuanFull Text:PDF
GTID:2428330566995894Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Emotion recognition is a hot research topic in multimedia information processing,pattern recognition and computer vision.With the development of deep learning and artificial intelligence,emotional recognition,as the key to human-computer interaction,is gaining wide attention from researchers.Emotional expressions are varied,of which facial expression and speech are the two most important emotional carriers.Research on bi-modal emotion recognition based on facial expressions and phonetics has important practical significance.This paper focuses on two modalities of facial expression and speech,and studies the application of deep learning in bi-modal emotion recognition.The main work is as follows:(1)In order to avoid the complex feature extraction process in traditional facial expression recognition,an improved convolutional neural network(CNN)structure based on classic AlexNet is proposed for facial expression recognition;Considering the insufficient samples of current emotion databases,a facial expression recognition method based on fine-tuning the VGG-face model is studied.Hence databases with small size can also use the complex CNNs to obtain better recognition results.Since the changes of facial expressions is a gradual process,in order to take advantage of the correlation between facial expressions during the process,a facial expression recognition method based on convolutional neural network and recurrent neural network is studied.(2)In order to improve the accuracy of speech emotion recognition,a method based on CNN is proposed in this paper,which uses Mel-spectrum of speech as the input of CNN,and then achieves the emotion classification.As a temporal sequence,the speech signal has strong correlations within time.In order to use the correlation between the speech sequence before and after,a speech emotion recognition method based on a bidirectional long-short time memory network is put forward;In order to take advantages of both LSTM and CNN,a speech emotion recognition method based on these two neural network structures is studied.(3)Bimodal emotion recognition methods based on facial expression and speech are studied.Three kernel based feature fusion methods and a weighed decision fusion method is analyzed.Experimental results based on eNTERFACE'05,RML,AFEW6.0 emotional databases show that the bimodal emotion recognition results obtained by the fusion methods have a certain improvement over the single modal emotion recognition results.
Keywords/Search Tags:Bimodal emotion recognition, facial expression recognition, speech emotion recognition, Convolutional Neural Network, Recurrent Neural Network, feature fusion, decision fusion
PDF Full Text Request
Related items