Font Size: a A A

Multimodal Emotion Recognition Based On Deep Learning

Posted on:2022-02-13Degree:MasterType:Thesis
Country:ChinaCandidate:T YangFull Text:PDF
GTID:2518306539980869Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
With the increasing demand for automatic emotion recognition system,the deep research of emotion recognition is becoming more and more important.In recent years,with the rapid development of hardware and deep learning methods,the performance of automatic emotion recognition has been continuously improved.However,due to the abstract concept and multiple expression of emotion,automatic emotion recognition is still a challenging task.At present,one of the traditional methods of emotion recognition is mainly focused on extracting different types of manual features.Then,the emotion content of the video is manually annotated to achieve the purpose of emotion recognition.However,hand-made features always need domain knowledge of specific tasks,and designing appropriate features may be more time-consuming.Therefore,exploring the most effective method based on autonomous feature extraction and learning classification for emotion recognition has become the core problem of most works;second,most works use single-mode emotion recognition,in fact,emotion expression is completed by a variety of emotion modes,such as facial expression,voice,posture,etc.,but the most direct way of human expression of emotion is voice and expression.Although the expression of a single mode can also identify emotions,it has certain limitations,which are easily affected by the surrounding environment and have low robustness.So,the use of the complementarity of multi-mode emotion information to improve the recognition effect has become a current research hotspot.Therefore,in order to solve the problems such as time-consuming feature extraction,poor robustness and low recognition rate in traditional emotion recognition methods,this thesis proposes a multi-modal emotion recognition method based on deep learning.The main research contents are as follows:(1)Data preprocessing of each modal sentiment database: for speech signals,one-dimensional static Meier spectrum is transformed into three-dimensional Meier spectrum through a series of preprocessing operations;For expression signal,the recognition area is extracted by using the multi-target location detection and recognition algorithm,and the redundant information is removed by clipping.(2)Speech emotion feature extraction: Firstly,an improved convolutional neural network recognition method based on Alex Net is studied.Because speech signal is a kind of time sequence signal,in order to capture the word order relationship in audio,the neural network of long and short memory is used to extract the time sequence feature.Because LSTM cannot meet the accuracy of long time series,a feedforward sequence memory neural network is proposed to display the time series features fully.In order to integrate the advantages of each network to enhance the effect of emotion recognition,a speech emotion recognition method based on residual convolution,long and short term memory network and feedforward sequence memory neural network was proposed.(3)Emotional feature extraction of video expressions: This thesis studies an expression recognition method based on 3D convolutional neural network.For deep3 DCNN,feature extraction is more effective,and residual blocks are added to the network to simplify the training of deep network.Aiming at the problem of too many parameters and low computational efficiency,a pseudo 3D convolutional network was proposed to reduce the parameters,and the dense connection between convolutional blocks and parallel feature extraction were used to enrich the feature information.Then,a new parallel dense 3D convolutional residual network algorithm was developed.(4)Multi-modal emotion feature fusion: facial expression and audio in multimodal database are regarded as two kinds of modes,and different modes fusion methods based on feature layer and decision layer are studied to improve the traditional multimodal fusion method.A multimodal emotion recognition method combining feature layer and decision layer is proposed.The experimental results show that the multimodal emotion recognition method is effective and the performance of the proposed method is better than that of the single-mode emotion recognition method.
Keywords/Search Tags:Emotion recognition, Deep learning, Speech recognition, Expression recognition, Multimodal feature fusion
PDF Full Text Request
Related items