Font Size: a A A

Research On Speech Emotion Recognition Based On Neural Network

Posted on:2021-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y ChangFull Text:PDF
GTID:2428330614954988Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
With the development of the computer industry,artificial intelligence has entered people's lives,and the realization of human-computer interaction through voice is gradually becoming the mainstream human-computer interaction method.Speech emotion recognition can allow machines to perceive human emotions and understand human emotions.Mental health status monitoring,education assistance,personalized content recommendation,and customer service quality monitoring have a wide range of application prospects.However,the low current recognition rate of speech emotion recognition systems is not enough for large-scale commercial use.Therefore,improving the accuracy of emotion recognition is a difficult problem to be solved urgently.The system framework of speech emotion recognition mainly includes two parts: speech emotion features extraction and emotion classification.In this thesis,some work has been done in speech emotion features extraction and emotion recognition network model:1.A new feature of RGB statistical spectrogram based on sonogram is proposed.Based on the original spectrogram,firstly,the image processing method is used to extract RGB components from the spectrogram to generate three new RGB spectrograms.Secondly,the statistical functions are used to expand the dimensions of RGB component spectrogram to generate a new statistical spectrogram.Finally,the feature validity verification is performed on a Convolutional Neural Network(CNN)with 4 layers of convolution.Simulation experiment results show that the accuracy of mean spectrum map is 57.2%,the variance spectrum map is 68.1%,and the maximum spectrum map is 54.2%.Experiments show that the new feature RGB statistical map proposed in this thesis can achieve speech emotion classification,and the new features are effective.2.According to the temporal characteristics of speech signals,a Long Short-Term Memory(LSTM)Neural Network with memory characteristics is used for speech emotions classification.Considering that the dimensions are too high after the fusion of different speech features,the attention mechanism is introduced to the LSTM neural network.Using the ability of the attention mechanism can perform the function of selective feature learning according to different weights,and to achieve the selective learning of high-dimensional features.By comparing the recognition accuracy of different feature datasets and different network structures,the effectiveness of LSTM Neural Network combined with attention mechanism in emotion recognition is verified.
Keywords/Search Tags:Speech Emotion Recognition, RGB Statistical Spectrogram, LSTM Neural Network, Attention Mechanism
PDF Full Text Request
Related items