Font Size: a A A

Research On Speech Emotion Recognition Model Based On Deep Neural Network

Posted on:2022-04-03Degree:MasterType:Thesis
Country:ChinaCandidate:H LiFull Text:PDF
GTID:2518306572465204Subject:IC Engineering
Abstract/Summary:PDF Full Text Request
Emotion is a complex physical and psychological activity.It is a unique importance and characteristic of human beings.It plays an important role in the daily complex social activities of human beings.Today's human-computer interaction technology is developing rapidly.At the same time of human-computer interaction,emotion recognition is of great value to correctly understand people's true intentions.Emotional computing has become a hot research topic today.Emotional computing mainly uses voice,video,semantics,etc.as input to estimate the emotional state.Voice contains more emotions,and is one of the main research directions of emotion computing.In the speech signal,the spectrogram can contain all the emotional features contained in the speech.Research has found that some existing speech emotion algorithms based on the spectrogram have the following problems.One is that neutral emotion is confused with the other three non-neutral emotions.,The accuracy of specific emotion recognition is low.The second is that the number of features is sometimes large during variable-length speech input,but some features have low contribution to emotion recognition.The third is that the training voice data needs to be filled with a uniform length during model batch training,which leads to unsatisfactory model effects..In response to the above problems,this paper researches and improves the speech emotion recognition algorithm based on deep learning from the aspects of data processing method,feature extraction method and model structure.It mainly uses speech input to distinguish four categories of emotions(happy,angry,sad,Neutral),choose to use spectrogram as the input of emotion calculation model for speech emotion recognition.Aiming at the emotional confusion caused by speech segmentation,the paper designs a special CNN and Bi-LSTM data connection method,and builds a variable-length emotion recognition model that includes CNN and Bi-LSTM structures.The model supports different lengths.Voice input in whole sentences,and the model parameters do not need to be dynamically adjusted.This method effectively solves the confusion between neutral emotion and the other three non-neutral emotions,and improves the recognition accuracy of the model by 7.8%.Aiming at the problem of extracting emotional features of whole-sentence speech,this paper proposes a variable-length speech emotion recognition algorithm based on attention mechanism,and designs a spectrogram-oriented spatiotemporal attention module and a CNN network-oriented convolution channel attention The force module,for the unimportant parts of the spatiotemporal data of the spectrogram and the feature data of the CNN convolution channel,through the attention mechanism,reduce their contribution in subsequent recognition,thereby increasing the proportion of core key data and features,and improving the accuracy of model recognition.Aiming at the problem of filling zero values in the batch training process,this paper proposes a data cutting and restoring method,which cuts the data before the first layer of convolution of the model,removes the data filling part,and overcomes the influence of the filling value on the model training.The variable-length speech emotion recognition model based on the attention mechanism designed in this paper has the advantages of being able to input speech of different lengths,not being affected by data filling,and paying more attention to important features.Compared with the work of speech emotion recognition based on the spectrogram in recent years,the recognition accuracy of the model in this paper has been improved by 2.5%,effectively solving the problem of emotional confusion,and extracting the emotional features contained in the spectrogram more efficiently.
Keywords/Search Tags:speech emotion recognition, spectrogram, neural network, attention mechanism, variable length speech input
PDF Full Text Request
Related items