Font Size: a A A

Research On Speech Emotion Recognition Based On Improved Convolutional Neural Network

Posted on:2021-04-17Degree:MasterType:Thesis
Country:ChinaCandidate:Z G XiaFull Text:PDF
GTID:2428330614959284Subject:Industrial engineering
Abstract/Summary:PDF Full Text Request
Speech emotion recognition is an important branch of artificial intelligence,which is generally considered as one of the important ways to realize human-computer intelligent interaction,and it has been widely used in intelligent dialogue systems,public opinion monitoring,service robots and other fields.In recent years,with the rapid development of deep learning technology,the application of deep learning to speech emotion recognition is currently a hot and effective research,especially the Convolution Neural Network(CNN)model which has quickly become one of the research emphases of speech emotion recognition models.However,there are still some problems in CNN that are worthy of research.First of all,in the study that uses CNN as the recognition model,the spectrogram is usually used as the input feature,but the spectrogram has the problem that the details are not obvious,which leads to a low recognition accuracy.Second,the CNN model will loss feature as the convolution layer deepen,which is the key to restricting the further improvement of the recognition rate.In response to these problems,the following research and experiments are carried out in this thesis.First of all,in the view of the problem that the details of the spectrogram are not obvious,this thesis designed a spectrogram texture feature extraction algorithm based on Log-Gabor and improved Local Binary Pattern(LBP).This algorithm first uses Log-Gabor enlarges the spectrogram detailed information on five scales and eight directions,and then use the improved LBP to extract texture features for the spectrogram of each direction and scale,and finally reconstruct the extracted texture features as the final features.At the same time,the extracted features are compared with Mel Frequency Cepstral Coefficient(MFCC)and Linear Predictive Cepstral Coefficient(LPCC).The experimental results proved that this method can effectively improve emotion recognition rate.Secondly,to address the problem of feature loss when CNN models deepen the convolutional layer,this thesis designed a multi-level residual Convolutional Neural Network.This network uses a residual structure that can span multi-level convolutional layers to compensate for missing features.Improve the network performance by making up for the original feature information,thereby improving the recognition rate.The experimental results proved that the model proposed in this thesis has better recognition rate,convergence speed and classification accuracy on the Emo DB dataset and CASIA dataset than the methods in the references.Finally,this thesis developed a speech emotion recognition system based on Jetson Nano host computer and intelligent service robot,and applied the Log-Gabor and improved LBP spectrogram feature algorithm and multilevel residual convolutional neural network to this system.The experimental results proved the superiority of this algorithm and the practicability of speech emotion recognition system.
Keywords/Search Tags:speech emotion recognition, CNN, spectrogram, residual network, LBP texture feature
PDF Full Text Request
Related items