Font Size: a A A

Emotion Recognition Based On Multi-modal Information Fusion

Posted on:2019-07-20Degree:MasterType:Thesis
Country:ChinaCandidate:N N GuoFull Text:PDF
GTID:2518306044960189Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Emotion recognition research is one of the key methods to achieve emotional intelligence.It involves many fields including physiology,psychology,cognitive science,etc.It is a multidisciplinary hot research topic.Since the emotion recognition of single modality(eg,voice,image,text,etc.)is limited by the single emotion feature,the recognition rate of emotion recognition needs to be improved.In recent years,some scholars have proposed a method of emotion recognition based on multimodal fusion information,which has greatly improved the accuracy of emotion recognition.This thesis proposes an emotion recognition method that fuses emotional speech,facial expression and emotional text,and adopts two strategies based on feature layer fusion and decision layer fusion to implement multi-modal fusion.The CHEAVD2.0 data set established by the Institute of Automation of the Chinese Academy of Sciences is adopted.It contains eight types of emotions:angry,sad,happy,anxious,surprised,disgusted,worried,and neutral,for a total of 5,624 multimodal correspondence files.The specific research work of this thesis includes the following:(1)Speech emotion recognition research.This thesis firstly selects the speech emotion feature according to the bag of the audio word based on the Mel frequency cepstrum coefficient.Then,it transforms the original Mel frequency cepstrum coefficient vector containing multiframe data features into a dimension-fixed sentence-level feature vector.Finally,using these sentence-level feature vectors as data for emotional emotive features to identify emotions.(2)Face expression recognition research.In this thesis,the video files in the data set are firstly framed and face detected to obtain facial expression data.Then,a six-layer convolutional neural network is designed to classify the expression.Then,in order to further improve the accuracy of expression recognition,the method of fine tuning VGG16 is used.Finally,compare the classification performance of the two models,select a model with a higher recognition accuracy rate,and use the output of the full-connection layer as the facial expression feature for the feature layer fusion experiment.The prediction results are used for decisionlevel fusion experiments.(3)Text emotion recognition research.Firstly,this thesis uses speech recognition tools to extract the text content of the data set used.Then,the obtained text data is pre-processed,such as removing punctuation marks,word segmentation,and removing stop words,to obtain vocabulary text data,and at the same time,the word vector model based on the skip-gram model is trained adopts Chinese Wikipedia corpus,and uses the word vector model to map the preprocessed vocabulary text data into word vectors.Then,using the commonly used violent average method to obtain the sentence-level feature vectors for textual emotion recognition.Finally,in order to improve the accuracy of textual emotion recognition,this thesis proposes a textual emotion recognition method based on recurrent neural network.The dynamic recurrent neural network is used to learn the sequence relationship of all lexical items in the sentence,and sentence-level feature vectors are obtained for textual emotion recognition.(4)Research on multi-modal emotion recognition.This thesis proposes a decision-level fusion method based on the second training.The basic idea is to fit the mapping relationship between single-mode decision results and sample tags through training.A comparative experiment with the feature layer fusion method and the traditional six decision layer fusion rules was conducted.Experimental results show that the accuracy of multi-modal emotion recognition is higher than that of single-modal emotion recognition.In the two fusion strategies,the decision-based layer fusion method based on second training proposed in this thesis obtains a higher recognition accuracy.
Keywords/Search Tags:speech emotion recognition, facial expression recognition, text emotion recognition, multi-modal emotion recognition, natural emotion data
PDF Full Text Request
Related items