Font Size: a A A

Audio-visual Emotion Recognition Based On Deep Learning And Backtracking Comparison

Posted on:2020-04-12Degree:MasterType:Thesis
Country:ChinaCandidate:B ZhangFull Text:PDF
GTID:2518306464995039Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Emotional interaction is an important way to realize human-computer interaction.By collecting human multi-modal information,the machine can recognize human emotional state and give reasonable feedback according to the corresponding emotional analysis results.Because of the increase of modal number,noise pollution and data redundancy,the efficiency of emotion recognition algorithm is seriously affected.Quickly and accurately extracting decisive emotional features from multiple modal information is a difficult problem in the field of multi-modal emotional recognition.Aiming at the problem of low recognition rate caused by noise pollution and data redundancy,this thesis proposes an audio-visual emotion recognition method based on deep learning and backtracking comparison,which includes three stages: video preprocessing,construction of audio-visual emotion recognition model and retrospective comparison.The main work of this paper is as follows:In the video preprocessing stage,video is separated into two parts: face image frame sequence and audio.Audio volume thresholds are used to filter and group the video data set to obtain image frame sequence and audio fragments with rich emotional features.Audio-visual emotion recognition model is divided into two parts: face image and speech.In the face image part,the frame sequence of the face image is input into the 3D-Res Net for feature extraction and emotion recognition;In the speech part,the Mel Frequency Cepstrum Coefficient(MFCC)and the Mel-Spectrogram are obtained by processing the speech signal in the frequency domain,and they are input into the LSTM and CNN parallel neural network structures for feature extraction,respectively.Then the extracted emotional features are input into Fusion Unit for fusion,and finally connected to Softmax function for emotional recognition.The decision-level fusion process of backtracking comparison is the optimization process of audio-visual emotion recognition model.If the recognition results are inconsistent,we need to go back to the video preprocessing stage,reduce the volume threshold,and select the corresponding grouped image frame sequence and audio to reconstruct the emotion recognition model.In this thesis,we use open source video database RML and e NTERFACE'05 to carry out experiments.With the increase of backtracking times and experimental time,the overall recognition rate of the algorithm in this thesis is also increasing.Finally,after three rounds of backtracking comparison,the overall recognition rate of the algorithm in RML and e NTERFACE'05 databases reached 83.58% and 88.21%,respectively.Through a large number of experiments and comparative analysis of relevant literature data,the effectiveness of the audio-visual emotion recognition method proposed in this thesis can be fully verified.
Keywords/Search Tags:Visual-audio emotion recognition, 3D-ResNet, Backtracking method, Decision level fusion
PDF Full Text Request
Related items