Research On Emotion Recognition Based On Expression And Speech Signal

Posted on:2021-04-23

Degree:Master

Type:Thesis

Country:China

Candidate:H P Zhang

Full Text:PDF

GTID:2428330605468899

Subject:Control engineering

Abstract/Summary:

PDF Full Text Request

With the aging of population and the increasing number of empty nesters,home service robots have become a research hotspot.Autonomous analysis of human emotions by robots is helpful to provide better services for human beings.In our daily life,because most of our emotional information is obtained from expression and speech signals,facial expression recognition(FER)and speech expression recognition(SER)have become an important part of emotion recognition research.At the same time,with the continuous development of AI technology and computer vision technology,emotion recognition methods based on deep learning and image processing are widely used.On this basis,this paper studies how to further improve the accuracy of facial expression recognition and speech emotion recognition.Facial expression recognition can be divided into static facial expression image recognition and facial expression image sequence recognition.Aiming at the problem of static facial expression image recognition,which is affected by the background information of image,this paper uses the method of extracting facial foreground image to improve the recognition rate.For the problem that only using a single image feature leads to the bad effect of facial expression recognition,this paper uses the method of RGB image channel and local binary image channel fusion to proposes a double channel weighted mixture convolution neural networks(WMCNN).The recognition rate of the model on CK+,Jaffe,Oulu and MMI is 99.07%,92.38%,86.034%and 78.24%respectively.Compared with the existing methods,our model further improves the accuracy of expression recognition.At the same time,compared with the recognition results of single channel recognition network,it can be found that the accuracy of expression recognition can be effectively improved by increasing LBP image channel.To solve the problem of poor generalization performance caused by the small number of common open facial expression data sets,this paper uses expression Gan(ExGAN)to expand the existing expression data set,and constructs our DB expression data set.Experiments show that training wmcnn model with extended datasets can improve the generalization performance of the model and the recognition effect of the model.In addition,aiming at the problem that the recognition rate of the network to the difficult classified expression is poor,this paper proposes the attention convolution neural network based on two channel weight mixture awmcnn based on the wmcnn network model by increasing attention network and attention loss.The awmcnn model is proved to have better recognition effect than wmcnn model on our-db dataset,and it can recognize the difficult to classify expression samples better.In order to solve the problem that.single frame image is easy to lead to expression recognition error,a facial expression recognition method based on video sequence is used.On the basis of the above static expression image recognition method,a convolution long short-term memory network(WMCNN-LSTM)with double channel weighted mixture and attention convolution bidirectional short-term memory network(AWMCNN-BILSTM)with double channel weight mixture are proposed to improve the accuracy of expression recognition*At last,the 10 fold cross validation experiments of WMCNN-LSTM on CK+,Oulu and MMI datasets are carried out,and the experimental results are 98.75%,87.91%and 87.14%respectively.Compared with the model based on static expression image,wmcnn-lstm network can further improve the accuracy of facial expression recognition.At the same time,in order to illustrate the recognition effect of AWMCNN-BILSTM network model,this paper carries out experiments on the our-db data set of WMCNN-LSTM network and AWMCNN-BILSTM network.The experimental results are 90.438%and 91.825%respectively.Through comparison,it can be found that AWMCNN-BILSTM network can recognize facial expression image sequence better than WMCNN-LSTM network.In speech emotion recognition,aiming at the problem that the accuracy of expression recognition is not high due to the use of a single speech feature,this paper shows the performance of speech spectrum and 3-D log MELS feature map in speech emotion signal,and proposes AWMCNN-BILSTM network by combining these two speech features,which uses spectrogram and 3-D Log MELS feature map is the input feature of the two channels,and the output of the two channels gets the final recognition result in the decision-making level according to the weighted fusion method.The unweighted accuracy of the model on the public speech emotion data set iemocap and emo-db is 69.2%and 93.05%,respectively.Compared with other existing methods,the recognition rate of the model is higher.

Keywords/Search Tags:

Facial expression recognition, Speech emotion recognition, CNN, RNN, GAN

PDF Full Text Request

Related items

1	Research On Bimodal Emotion Recognition Based On Facial Expression And Speech Signal
2	Research On Emotion Recognition Based On Speech And Facial Expression
3	Research On Multi-modal Emotion Recognition Algorithm Based On Speech And Face Expression
4	Driver Road Rage Recognition By Combining Facial Expression And Speech
5	Research On Emotion Recognition Based On Speech And Facial Expression
6	Emotion Recognition Based On Multi-modal Information Fusion
7	Research On Expression And Speech Bimodal Emotion Recognition Of Children
8	Research On Multi-modal Emotion Recognition Method Combining Speech And Expression
9	Research On Emotion Recognition Based On Expression And Speech Signal
10	Application Research Of Emotion Recognition Based On Deep Learning