Font Size: a A A

Research On Emotion Recognition Based On Expression And Speech Signal

Posted on:2021-04-23Degree:MasterType:Thesis
Country:ChinaCandidate:H P ZhangFull Text:PDF
GTID:2428330605468899Subject:Control engineering
Abstract/Summary:PDF Full Text Request
With the aging of population and the increasing number of empty nesters,home service robots have become a research hotspot.Autonomous analysis of human emotions by robots is helpful to provide better services for human beings.In our daily life,because most of our emotional information is obtained from expression and speech signals,facial expression recognition(FER)and speech expression recognition(SER)have become an important part of emotion recognition research.At the same time,with the continuous development of AI technology and computer vision technology,emotion recognition methods based on deep learning and image processing are widely used.On this basis,this paper studies how to further improve the accuracy of facial expression recognition and speech emotion recognition.Facial expression recognition can be divided into static facial expression image recognition and facial expression image sequence recognition.Aiming at the problem of static facial expression image recognition,which is affected by the background information of image,this paper uses the method of extracting facial foreground image to improve the recognition rate.For the problem that only using a single image feature leads to the bad effect of facial expression recognition,this paper uses the method of RGB image channel and local binary image channel fusion to proposes a double channel weighted mixture convolution neural networks(WMCNN).The recognition rate of the model on CK+,Jaffe,Oulu and MMI is 99.07%,92.38%,86.034%and 78.24%respectively.Compared with the existing methods,our model further improves the accuracy of expression recognition.At the same time,compared with the recognition results of single channel recognition network,it can be found that the accuracy of expression recognition can be effectively improved by increasing LBP image channel.To solve the problem of poor generalization performance caused by the small number of common open facial expression data sets,this paper uses expression Gan(ExGAN)to expand the existing expression data set,and constructs our DB expression data set.Experiments show that training wmcnn model with extended datasets can improve the generalization performance of the model and the recognition effect of the model.In addition,aiming at the problem that the recognition rate of the network to the difficult classified expression is poor,this paper proposes the attention convolution neural network based on two channel weight mixture awmcnn based on the wmcnn network model by increasing attention network and attention loss.The awmcnn model is proved to have better recognition effect than wmcnn model on our-db dataset,and it can recognize the difficult to classify expression samples better.In order to solve the problem that.single frame image is easy to lead to expression recognition error,a facial expression recognition method based on video sequence is used.On the basis of the above static expression image recognition method,a convolution long short-term memory network(WMCNN-LSTM)with double channel weighted mixture and attention convolution bidirectional short-term memory network(AWMCNN-BILSTM)with double channel weight mixture are proposed to improve the accuracy of expression recognition*At last,the 10 fold cross validation experiments of WMCNN-LSTM on CK+,Oulu and MMI datasets are carried out,and the experimental results are 98.75%,87.91%and 87.14%respectively.Compared with the model based on static expression image,wmcnn-lstm network can further improve the accuracy of facial expression recognition.At the same time,in order to illustrate the recognition effect of AWMCNN-BILSTM network model,this paper carries out experiments on the our-db data set of WMCNN-LSTM network and AWMCNN-BILSTM network.The experimental results are 90.438%and 91.825%respectively.Through comparison,it can be found that AWMCNN-BILSTM network can recognize facial expression image sequence better than WMCNN-LSTM network.In speech emotion recognition,aiming at the problem that the accuracy of expression recognition is not high due to the use of a single speech feature,this paper shows the performance of speech spectrum and 3-D log MELS feature map in speech emotion signal,and proposes AWMCNN-BILSTM network by combining these two speech features,which uses spectrogram and 3-D Log MELS feature map is the input feature of the two channels,and the output of the two channels gets the final recognition result in the decision-making level according to the weighted fusion method.The unweighted accuracy of the model on the public speech emotion data set iemocap and emo-db is 69.2%and 93.05%,respectively.Compared with other existing methods,the recognition rate of the model is higher.
Keywords/Search Tags:Facial expression recognition, Speech emotion recognition, CNN, RNN, GAN
PDF Full Text Request
Related items