Font Size: a A A

Research On Multi-modal Emotion Recognition Method Combining Speech And Expression

Posted on:2021-05-07Degree:MasterType:Thesis
Country:ChinaCandidate:J B ZhangFull Text:PDF
GTID:2428330605973096Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Emotion recognition is a research hotspot in the fields of computer vision and pattern recognition.With the development of artificial intelligence and deep learning technology,it has attracted extensive attention from researchers.There are various ways of expressing emotions,among which speech and expression are the two most direct and reliable emotion carriers.The research of multi-modal emotion recognition method combining speech and expression has important practical significance.Aiming at the low accuracy of speech emotion recognition caused by the influence of speakers,speaking style,environment and other unrelated emotion factors,a speech emotion recognition algorithm based on attention model and convolutional neural network is given.Using the convolutional neural network's ability to process images and the ability to effectively extract time-frequency features of time series data,the static,first-order difference,and second-order difference Mel spectrums are used as input data of the neural network,and then use the attention model to identify and Delete silent frames and emotion-independent frames,retain effective emotion information,and finally classify voice emotions by Softmax classifier.Through experiments on IEMOCAP and Emo-DB databases,the recognition accuracy rate of 89.25% and 88.57% is obtained.Compared with the recognition rate of 84.52% based on the fusion of audio features based on the multi-core learning algorithm with the highest recognition rate on the IEMOCAP database,this algorithm improves Compared with 86.11% recognition rate based on BP neural network feature selection method with the highest recognition rate on Emo-DB database,the algorithm in this paper improves by 2.46%,which is used to improve the accuracy of multimodal emotion recognition ready.Aiming at the problem that the sound is easily affected by the surrounding environmental noise and leads to a low recognition rate,using the complementarity of different modal emotion information,a multi-modal emotion recognition method combining speech and expression is given,which uses feature layer fusion and decision layer fusion The strategy integrates speech and facial expression information,and improves the traditional fusion algorithm.A multi-modal emotion recognition algorithm with dual fusion of feature layer and decision layer is given,which not only retains the differences between different modal emotion information.The correlation between emotional information is also retained.Experiments on the e NTERFACE'05 multi-modal sentiment database show that the recognition effect can reach 89.3%.Compared with the recognition rate of 83.92% obtained by the kernel space feature fusion method with the highest recognition rate,the algorithm in this paper improves the recognition by 5.38% Accuracy.
Keywords/Search Tags:Multimodal emotion recognition, speech emotion recognition, expression recognition, convolutional recurrent neural network, attention model
PDF Full Text Request
Related items