Font Size: a A A

Multimodal Emotion Recognition Algorithm Based On Deep Learning

Posted on:2021-10-05Degree:MasterType:Thesis
Country:ChinaCandidate:J FuFull Text:PDF
GTID:2518306476950849Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
As an important basis of human life experience,emotion affects human cognition,perception and daily life.Therefore,emotion recognition is an important research area in human-computer interaction,and it has received more and more attention and research in recent years.Emotions can be expressed in many ways,multimodal emotion recognition has become the focus of development in the field of emotion recognition.Based on the modalities such as speech and facial expressions,this paper studies the speech emotion recognition and facial expression recognition,and uses feature fusion and decision fusion methods to achieve multimodal emotion recognition.The detail work is as follows:(1)Firstly,the emotional features of speech are studied,including prosodic features,spectral-related features and sound quality features.The algorithm models of SVM,RF,KNN and DNN with high level statics functions type speech features as input and three LSTM framework models with low level descriptors type speech features as input are studied and analyzed.In addition,an algorithm combining time domain convolution based on gated residual network and LSTM model based on attention mechanism is proposed to realize the task of speech emotion recognition.Finally,the performance of a variety of speech emotion recognition algorithms is compared and analyzed through experiments.The importance of data normalization to the recognition of machine learning algorithms and the performance of LSTM models based on attention mechanism in three LSTM frameworks are founded.The experimental results show that the proposed method combining time domain convolution and LSTM model based on attention mechanism can further improve the performance of speech emotion recognition on the e NTERFACE'05 dataset.(2)Based on the CK + and FER + facial expression databases,The static facial expression recognition method based on the convolutional neural network model of VGGNet and Mobile Net is studied,and it is verified that the model of the Mobile Net structure can reduce the model parameters at the same time,it guarantees effective recognition accuracy and has certain advantages in facial expression recognition tasks.Based on this,the video sequence expression recognition method based on convolutional neural network and attention-based long and short-term memory neural network is studied,and the performance of the algorithm is experimentally verified on the e NTERFACE'05 multimodal emotion dataset.(3)The feature fusion method which use high-dimensional emotion features extracted by speech emotion recognition model and facial expression recognition model or text emotion recognition model,and decision fusion method which based on average rule,weighted sum rule and product rule are studied.Besides,a method for extracting emotional key frames using speech model and performing frame-level feature fusion is also proposed.The average recognition rate and confusion matrix of two multimodal fusion methods of feature fusion and decision fusion on e NTERFACE'05 and IEMOCAP multimodal emotion datasets are studied and analyzed,and the confusion matrix of single-modal and multimodal methods are analyzed too.The experimental results show that the feature fusion method using high-dimensional emotion features has certain advantages over the decision fusion method.It is verified that the multimodal emotion recognition method has a significant advantage over the single-modal emotion recognition method.At the same time,the effectiveness of the proposed method for extracting emotional key frames using speech model and performing frame-level feature fusion is verified.This method achieves the highest average recognition rate of 91.53% on the e NTERFACE'05 dataset.
Keywords/Search Tags:multimodal emotion recognition, deep learning, attention mechanism, time domain convolution, MobileNet
PDF Full Text Request
Related items