Font Size: a A A

Research On Multi-modal Emotion Recognition Algorithm Based On Speech And Face Expression

Posted on:2019-11-17Degree:MasterType:Thesis
Country:ChinaCandidate:T T LuFull Text:PDF
GTID:2428330545960435Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Natural human-computer interaction technology is an important research direction of current computer application technology,and automatic emotion recognition is one of the key technologies for realizing natural human-machine interaction.Considering that convolutional neural networks can simultaneously perform image feature extraction and pattern classification,and the two mechanisms of local connection and weight sharing can reduce training parameters,therefore,based on the in-depth study of convolutional neural network theory,in order to avoid the complex processes such as feature extraction in traditional methods,the convolutional neural network is applied to the field of emotion recognition.In this thesis,a new algorithm for speech emotion recognition and facial expression recognition based on convolutional neural network is proposed,and then a multi-modal fusion emotion recognition algorithm is proposed.The main research contents are as follows:(1)Research on convolutional neural network theory.The fundamentals of convolutional neural network and the parametric learning algorithm are analyzed,which provides the theoretical basis for the combination of convolutional neural network and emotion recognition.(2)A new speech emotion recognition algorithm based on spectrogram and convolutional neural network is proposed.Since spectrogram is a two-dimensional image that can reflect the time-frequency features of speech,in order to solve the problems of complex feature extraction and poor quality of the features in the traditional recognition algorithms,the spectrogram is proposed as the input data of the convolutional neural network.The convolutional neural network is used to automatically learn the characteristics of the spectrogram,and the end-toend processing of the spectrogram is implemented.Supervised learning and training are performed to obtain a suitable network model.By performing experiments on the CASIA corpus and the German Berlin corpus,the corresponding speech emotion recognition rates can reach 79.6% and 77.8%,respectively,indicating the feasibility of the algorithm.(3)An emotion recognition algorithm that combines speech and facial expression is proposed.Since the expression of human emotions is carried out simultaneously via multiple forms,the emotion recognition in a single form has its certain limitations.Therefore,using the complementarity between different modalities,this thesis proposes an algorithm for multimodal emotion recognition based on speech and facial expression.That is,the convolutional neural network is used to automatically learn the features of the facial expressions and the spectrogram,and the trained network model is used to identify the test samples in order to obtain the corresponding recognition results.Then the decision-level fusion is performed to obtain the final recognition results.The fusion experiment was performed on the e NTERFACE'05 audio and video multimodal emotion database,and the recognition result is as high as 84.8%.Thus,the overall performance of the recognition system is improved to some extent.
Keywords/Search Tags:Convolutional neural network, Speech emotion recognition, Sonogram, Facial expression recognition, Multimodal fusion
PDF Full Text Request
Related items