Font Size: a A A

Research On Dimensional Speech Emotion Recognition Based On The Multi-granularity Feature Fusion

Posted on:2017-04-19Degree:MasterType:Thesis
Country:ChinaCandidate:X ChenFull Text:PDF
GTID:2308330503487209Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the speech processing in the artificial intelligence,human beings are hoping for communication with the computer in a more nature way as the same as themselves. Following the speech processing, speech emotion recognition plays a very important role in the Human-Computer Interaction, which has get much attention. In the recent years, with the rapid development of the psychology, physiology, neuroscience and the computer science, exploring the suitable features of the emotion expression, which is a very important problem in this field. Recently, research are more focus on the global statistic features in this field, but we are not sure the suitable of this. Moreover, much of the research are based on the discrete speech emotion corpus, there is no widely accepted features and the recognition method.In construction of the low-level feature set, we have extracted three categories features, namely the prosodic features, the qualitative features, the spectral features. Moreover, based on the nonlinear theory of the teager, combining the mel perceptionpsychology, we extracted the feature Teager_Mel. The advantage of the Teager_Mel is that it not only concerning the nonlinear generation process of the speech but also following the mel perceptionpsychology.We having experiment both on the discrete speech corpus-DISEC and the dimensional speech corpus-VIM. And the results show that Teager_Mel is more promising in the speech emotion recognition system in total, comparing with the commonly MFCC.On the low-level feature set above on we had extracted, we have a series of processes on it. Global statistic features are used in the commonly, which is the statistic of the frame features of the speech short frames. But we concerning that it may losing the prosodic information, so we are searching for the more suitable length for the speech emotion recognition in this paper. And also, we concerning the cognitive of the emotion perception process, namely including three important phases the application, release and the relaxation. Then we modeling the perception process in a gauss functions, and thus we get the windowed features for the dimensional speech emotion recognition.Moreover, consider the time sequence of the speech features, we proposes a Cognitive Mechanism Recurrent neural network(CMRNN). Thus the modified network not only can use the shorted frame features but also adopt the windowed features and the phase features. We think that both the time sequence of the speech features and statistic features can be suitable for the dimensional speech emotion recognition.Lastly, we testing the CMRNN on the dimensional network-VAM, and we get the promising accuracy. More carefully, phase features and emotion window features has got varying degrees improvement about 0.66 in the Correlation Coefficients. The fusion of the multi granularity features gets 16% improvement,comparing with the global statistic features. Thus we think this CMRNN is effective.
Keywords/Search Tags:dimensional speech emotion recognition, extract multi granularity features, CMRNN, cognitive mechanism
PDF Full Text Request
Related items