Font Size: a A A

Application Research Of Emotion Recognition Based On Deep Learning

Posted on:2021-01-17Degree:MasterType:Thesis
Country:ChinaCandidate:W J LiFull Text:PDF
GTID:2518306470960929Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Emotion recognition is an important research topic in the field of human-computer interaction.It can play an important role in the fields of education,medical treatment,safe driving,game development,and so on.Facial expressions and speech are the two most important parts of human emotion expression,with facial expressions accounting for 55% and speech accounting for 38%.Early emotion recognition is mainly to extract features designed by human,and then use traditional machine learning methods to recognize.However,with the development of computer technology,the requirements for recognition accuracy and robustness have increased,and traditional machine learning methods have shown their limitations.In recent years,deep learning has shown excellent performance in various fields,and most of the current emotion recognition research is based on deep learning.Emotion recognition based on deep learning usually uses ordinary convolutional neural networks(CNN),but ordinary CNN model has too many parameters,and does not consider the sparse characteristics of emotions.The degree of emotional information contributed by different parts of facial expressions is different,and the degree of emotional information contributed by speech signals at different time periods is also different,so the traditional CNN is not efficient.At present,there are mainly single-mode and multi-modal emotion recognition,but the existing multi-modal emotion database is basically recorded under ideal conditions in the laboratory,and is not suitable for emotion recognition applications in real-world scenarios,and multi-modal emotion recognition model is usually very large,which makes the recognition time-consuming too long.It is not suitable for building a real-time emotion recognition system,nor for applying to lower-level computers.This article mainly focuses on single-modal emotion recognition,researching facial expressions and speech emotion recognition separately.The main contents of the work are as follows:(1)For facial expression recognition,in order to solve the problem that the amount of ordinary CNN parameters is too large and it is difficult to pay attention to the different contributions of emotional information in different parts of human facial expressions,this paper proposes the SE-Mini-Xception model,which is based on the original Xception,by trimming the number of network layers,and then combined with the attention module(SE block),a lightweight convolutional neural network model with attention mechanism is obtained.SE-Mini-Xception was verified on the public real facial expression databases FERPlus and RAF-DB,and the recognition accuracy was 82.43% and 84.35% respectively,which was only 2?3% lower than the original Xception model.The size of the Xception model is 239 M,and the size of the SE-Mini-Xception model is only 2.71 M,which greatly reduces the amount of model parameters.Experiments show that SE-Mini-Xception utilizes separable convolution and attention mechanisms,which greatly reduces the model parameter amount while the performance does not drop too much,and can be effectively applied to facial expression recognition.(2)For speech emotion recognition,in order to solve the problem that ordinary CNN cannot effectively deal with time series features,this paper introduces separable convolution and long short-term memory network(LSTM),and designs the Sep-CNN-LSTM model application For speech emotion recognition.The experiment is verified on the public speech emotion corpus RAVDESS.First,the original speech is detected by endpoint detection and filter denoising to obtain a valid speech segment,and then the features are extracted for speech emotion recognition.The 1D Sep-CNN-LSTM model trained with Mel Frequency Cepstrum Coefficient(MFCC)features and The using 2D Sep-CNN-LSTM model trained with spectrogram features,respectively,have achieved 90.77% and 82.21% recognition accuracy on the test set.Experiments show that the Sep-CNN-LSTM model can be effectively applied to speech emotion recognition.(3)Based on the SE-Mini-Xception model and 1D Sep-CNN-LSTM model proposed in this paper,a real-time facial expression recognition system and a speech emotion recognition system were designed and implemented respectively,and deployed in JETSON NANO.After testing,the performance of these two systems can meet the basic emotion recognition task.
Keywords/Search Tags:Facial Expression Recognition, Speech Emotion Recognition, Separable Convolution, Attention Mechanism, LSTM
PDF Full Text Request
Related items