Audio-visual Emotion Recognition Based On Deep Learning And Backtracking Comparison

Posted on:2020-04-12

Degree:Master

Type:Thesis

Country:China

Candidate:B Zhang

Full Text:PDF

GTID:2518306464995039

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Emotional interaction is an important way to realize human-computer interaction.By collecting human multi-modal information,the machine can recognize human emotional state and give reasonable feedback according to the corresponding emotional analysis results.Because of the increase of modal number,noise pollution and data redundancy,the efficiency of emotion recognition algorithm is seriously affected.Quickly and accurately extracting decisive emotional features from multiple modal information is a difficult problem in the field of multi-modal emotional recognition.Aiming at the problem of low recognition rate caused by noise pollution and data redundancy,this thesis proposes an audio-visual emotion recognition method based on deep learning and backtracking comparison,which includes three stages: video preprocessing,construction of audio-visual emotion recognition model and retrospective comparison.The main work of this paper is as follows:In the video preprocessing stage,video is separated into two parts: face image frame sequence and audio.Audio volume thresholds are used to filter and group the video data set to obtain image frame sequence and audio fragments with rich emotional features.Audio-visual emotion recognition model is divided into two parts: face image and speech.In the face image part,the frame sequence of the face image is input into the 3D-Res Net for feature extraction and emotion recognition;In the speech part,the Mel Frequency Cepstrum Coefficient(MFCC)and the Mel-Spectrogram are obtained by processing the speech signal in the frequency domain,and they are input into the LSTM and CNN parallel neural network structures for feature extraction,respectively.Then the extracted emotional features are input into Fusion Unit for fusion,and finally connected to Softmax function for emotional recognition.The decision-level fusion process of backtracking comparison is the optimization process of audio-visual emotion recognition model.If the recognition results are inconsistent,we need to go back to the video preprocessing stage,reduce the volume threshold,and select the corresponding grouped image frame sequence and audio to reconstruct the emotion recognition model.In this thesis,we use open source video database RML and e NTERFACE'05 to carry out experiments.With the increase of backtracking times and experimental time,the overall recognition rate of the algorithm in this thesis is also increasing.Finally,after three rounds of backtracking comparison,the overall recognition rate of the algorithm in RML and e NTERFACE'05 databases reached 83.58% and 88.21%,respectively.Through a large number of experiments and comparative analysis of relevant literature data,the effectiveness of the audio-visual emotion recognition method proposed in this thesis can be fully verified.

Keywords/Search Tags:

Visual-audio emotion recognition, 3D-ResNet, Backtracking method, Decision level fusion

PDF Full Text Request

Related items

1	Research Of Speech Recognition Method Based On Audio-visual Information Fusion
2	Research On Emotion Recognition Based On Multi-modal Feature Fusion
3	Research Of The Fusion Of EEG Signals And Facial Expressions Based Emotion Recognition
4	Research On Speech Emotion Recognition Method Based On Multi-feature Fusion
5	Research On Emotion Recognition Based On Multi-feature Fusion Of Video And Audio
6	Research On Feature Extraction Algorithm Of IMFE And Fusion KELM Recognition Algorithm For Speech Emotion Recognition
7	Research On Multi�modal Emotional Recognition Based On Audio And Visual
8	Research On Emotion Recognition Of Monomodal Speech And Multimodal Speech Vision Based On Transfer Learning
9	Deep Learning Based Speech Emotion Recognition By Fusing Acoustic Features And Transcriptions Clues
10	Research On Speech Emotion Recognition Based On Multimodal Information Fusion