Font Size: a A A

Research On Emotion Recognition Technology Based On Speech Information

Posted on:2022-12-11Degree:MasterType:Thesis
Country:ChinaCandidate:J B LuoFull Text:PDF
GTID:2518306764480614Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
With the vigorous development of artificial intelligence technology,research in the field of speech recognition and image processing has become increasingly extensive and in-depth.In order to make machines more intelligent and capable of capturing human emotions to make the voice interaction between humans and machines more humane,the research in the field of speech emotion recognition has gradually attracted people's attention.At present,most of the researches on speech emotion recognition use deep learning methods.However,these methods simply take the spectrogram as the input of the convolutional neural network and realize the recognition through the image recognition technology,which makes the recognition effect poor.Therefore,this paper focuses on the feature extraction method of spectrogram and the improvement of CNN recognition model.The main research work is as follows.Firstly,this paper proposes a method for spectrogram extraction using two-dimensional Log-Gabor transform and an improved LBP algorithm for the speech feature extraction technology.Firstly,normalize and grayscale the spectrogram,and then amplify the grayscale spectrogram in different directions and scales through the transform of two-dimensional Log-Gabor,so as to solve the problem that the texture details of the original spectrogram are not obvious.Then,texture features will be extracted from these grayscale spectrograms with the help of the improved LBP Algorithms.Secondly,this paper designs a Residual convolution neural network model combined with the residual structure for the emotion recognition model.Compared with the traditional convolutional neural network,residual CNN solves the problem of feature loss caused by the increase of the number of convolutional layers by adding a residual structure to the CNN,which effectively prevents the drop of the recognition accuracy of the convolutional neural network caused by the loss of features.Finally,it is first verified that the features extracted based on the two-dimensional Log-Gabor transform and the improved LBP algorithm are more effective in improving the recognition accuracy than the traditional features through the experiments on the EmoDB dataset after data sample amplification.And then it is verified by experiments that the residual convolutional neural network model designed in this paper is better than the convolutional neural network model without residual structure in improving the training convergence speed,recognition accuracy and classification accuracy.And based on this,it is also verified through experiments that adding appropriate Gaussian noise to the spectrogram can also improve the recognition rate to some extent.In summary,the research in this paper provides a certain reference for improving the recognition rate of speech emotion.
Keywords/Search Tags:Speech Emotion Recognition, Residual Convolutional Neural Network, Spectrogram, Log-Gabor Transform, LBP Algorithm
PDF Full Text Request
Related items