Font Size: a A A

Research On Emotion Recognition Based On Speech And Facial Expression

Posted on:2022-09-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y X MaFull Text:PDF
GTID:1488306572973519Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Emotion is a wonderful and interesting phenomenon closely related to people's daily life,decision-making activities and physical and mental health.As one of the frontier research directions in the field of artificial intelligence,emotion recognition has great potential application value in related scenarios involving intelligent human-computer interaction.It is of great theoretical and practical significance to carry out the research on emotion recognition methods.Speech and facial expression are the two most common ways to convey affective information.Moreover,it has the advantage of convenient and nonintrusive data acquisition to recognise emotion via speech and expression.The main challenges of current research on emotion recognition are:(1)emotional information is difficult to represent efficiently;(2)identity diversity interferes with emotion recognition algorithms;(3)multimodal emotion is difficult to fuse efficiently.In view of the above problems,this thesis has carried out in-depth research on speech emotion recognition,expression recognition and multimodal emotion fusion.The main works and innovative contributions of this thesis are as follows:Aiming at the problem that emotional information in speech signal is difficult to represent,a time-frequency convolution and sequence modeling neural network structure customized for acoustic data is proposed in this thesis.This structure is customized for acoustic data.It overcomes the limitations of traditional methods relying solely on digital signal processing or ordinary neural network structure for emotional feature extraction,and can make full use of representation ability of neural networks on the premise that the emotional information in the speech signal is not lost.Experimental results show that methods based on time-frequency convolution and sequence modeling can achieve higher accuracy of emotion recognition.Aiming at the interference problem caused by identity diversity on emotion recognition,an identity/emotion coupling loss function is proposed in this thesis.While measuring the performance of facial expression recognition,the loss function measures the performance of the model on the face identification task.This function can alleviate the interference caused by identity diversity on facial expression recognition.The experimental results show that compared with the traditional cross-entropy loss function,the model trained based on the identity/emotion coupling loss function can not only achieve higher expression recognition accuracy,but also reduce the emotional vector bias introduced by identity diversity.Aiming at the objective problems in audio-visual emotion recognition task,such as "information pollution","information redundancy" and "inefficient emotion fusion",a deep weighted method for audio-visual emotion fusion is proposed in this thesis.This method uses a frame-level hard weighting strategy for the associated modeling of audio and visual modalities.In addition,this method uses deep neural networks for the calculation of emotion representation and the highly non-linear fusion of multimodal emotion representation.Experimental results indicate that this method can significantly alleviate the problems of "information pollution","information redundancy" and "inefficient emotion fusion".Aiming at the problem that multi-modal emotion information is difficult to mine fully and utilize efficiently,a method based on cross-modal attention for continuous dimensional emotion estimation is proposed in this thesis.This method uses multi-modal association modeling to fully explore the internal relation between face and speech,and combine it with some important prior knowledge.The fusion of multi-modal emotion representation is constrained by means of multi-modal attention,and finally,continuous dimensional emotion states are estimated at the same time in the manner of multi-task learning.Experimental results show that this method can improve the concordance correlation coefficient of multidimensional continuous sentiment estimation.The methods proposed in this thesis effectively improve the performance of emotion recognition based on speech and facial expressions,and have important theoretical guiding significance and practical value.
Keywords/Search Tags:Speech, Facial Expression, Emotion Recognition, Multi-modal, Machine Learning
PDF Full Text Request
Related items