Font Size: a A A

Speech And Facial Double Model Emotion Recognition

Posted on:2021-07-10Degree:MasterType:Thesis
Country:ChinaCandidate:K Q RenFull Text:PDF
GTID:2518306554466194Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Emotion plays a very important role in human daily life.Emotion contributes to express the individual's thought.Emotion recognition is the research which focus on the field of artificial intelligence,pattern recognition and human-computer interaction.Most of the early researches on emotion recognition were based on a single mode,and then it was found that the use of a single mode for emotion recognition had great limitations,while the extracted emotional features from different modes could complement each other to a certain extent,and the recognition accuracy could be further improved through the fusion of different modes for emotion recognition.Speech and facial expressions are the fastest and most direct ways for human to express emotions,which have become the most important modes in emotion recognition.In recent years,deep learning has been widely used in many fields of artificial intelligence,and has made quite good achievements.Therefore,this paper makes use of the advantages of deep learning technology,applies it to multi-modal emotion recognition,and carries out research on deep learn-based dual-modal emotion recognition of speech and facial expressions.The main work include:1.In view of the problem that most of the current researches adopt one-dimensional speech signal as input,ignore the correlation between time domain and frequency domain,as well as the unprocessed silent frames and frames unrelated to emotion,and generate the non-discrimination of emotion features,an attention-sensing deep convolution neural network speech emotion recognition model ADCNN is proposed.In this model,the log melspectrogram of speech signal and its first-order and second-order difference coefficients are used as the input of the convolution neural network to extract the depth features for each speech segment.Then combining with the time pyramid matching algorithm,the fragmentlevel emotional features with variable dimensions are transformed into discursive emotional features with fixed dimensions.Finally,the extracted features are given different weights by the attention mechanism,and the SVM classifier is used to realize the emotion classification.The experimental results show that the ADCNN model can effectively improve the accuracy of speech emotion recognition by combining the characteristics of speech signal in time domain and frequency domain.2.There is a great limitation in emotion recognition for a single mode,and the emotion features extracted from different modes can complement each other to some extent,put forward the double modal of voice and facial expression fusion emotion recognition method.Firstly,ADCNN model was used to extract speech emotional features,and then 3D-CNN network was used to extract facial expression emotional features.Then the fusion network constructed by DBN model fuses the features of the two modes and captures the implicit nonlinear correlation between speech and facial expression features.Finally,the fused features are input into support vector machine for emotion recognition.Experimental results show that dual mode emotion recognition can better compensate for the limitation of single mode and achieve higher accuracy.
Keywords/Search Tags:emotion recognition, deep learning, feature extraction, attention mechanism, modal fusion
PDF Full Text Request
Related items