Font Size: a A A

Research On Speech Emotion Recognition Based On Spectrogram And Statistical Features

Posted on:2021-01-05Degree:MasterType:Thesis
Country:ChinaCandidate:J Q WuFull Text:PDF
GTID:2438330605463007Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Speech,as the carrier of human life communication,runs through everyone's life.With the rapid development of artificial intelligence,people's desire for information communication is not only reflected in semantic expression,but also in the understanding of human emotions by machines.In the process of human-computer interaction,this kind of understanding is no longer simply relying on the response of the machine,but to make a perceptual response after receiving a voice signal.Therefore,the recognition of speech emotion is an important factor for improving machine intelligence,and its importance in the field of human-computer interaction is self-evident.In order to improve the accuracy of speech emotion recognition,enhance the harmony of human-computer interaction,and better establish the emotional connection between humans and machines,this article focuses on the following two aspects:(1)In order to solve the problem of deep learning overfitting and low recognition accuracy on small data sets,a deep learning model based on the combination of speech and image dual convolutional neural network(CNN)and gated recurrent unit(GRU)network is proposed.In this method,the original audio is enhanced by moving up and down.The enhanced speech signal is mapped to the Mel scale and a Mel power spectrum is generated.Then,image enhancement operations such as rotation,cut angle,and offset are performed.Combining the convolutional neural network 's ability to recognize frequency-domain features and the ability to acquire time-series information from a gated recurrent unit network,a fusion model CGRU is formed.This model automatically learns deep spectrum features and performs emotion recognition.The results show that the accuracy of sentiment recognition using CGRU method of spectral features exceeds the recognition effect of traditional manual feature eGeMAPS on this database,and the proposed method is competitive on speech emotion recognition tasks.In addition,under the same training parameters,CGRU has lower time complexity than CLSTM.(2)In order to obtain emotion information from multiple dimensions,further improve the accuracy of speech emotion recognition,and make up for the problem of insufficient single characterization ability,an AtBi GRU model based on dual-channel features is proposed.This method extracts the deep spectrum features and HSFs(High level Statistics Functions)features of the speech signal through dual channels,uses the deep convolutional representation capabilities,and combines the experience and knowledge of traditional acoustic features to construct fusion features which contain local and global emotional information.The fusion feature obtains the weight value of the feature dimension through the Attention mechanism.The fusion feature calculated by Attention is used as the input of the bidirectional GRU model to capture the time-domain features of the speech signal from the time direction.The results show that the forward and backward learning of the AtBiGRU network has a certain degree of improvement in the emotional recognition rate of the IEMOCAP dual-channel features compared to the features before the fusion.The experiment verifies the influence of different convolutional features on the recognition task.It is found that VGG16 features are more suitable for the feature representation of this task than VGG19 features.The network models with different architectures were built in experiments.The results show that the recognition accuracy of the dual-channel features on the Bidirectional Recurrent Neural Network(BiRNN)model is better than that of the unidirectional recurrent neural network(RNN).
Keywords/Search Tags:Speech emotion recognition, Spectral features, Dual channel mechanism, Convolutional neural network, Attention mechanism
PDF Full Text Request
Related items