Font Size: a A A

Research On Speech Emotion Recognition Based On Attention Mechanism

Posted on:2021-02-02Degree:MasterType:Thesis
Country:ChinaCandidate:D F YuFull Text:PDF
GTID:2438330602998320Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The goal of speech emotion recognition is to copy the perception mechanism of human.The main task is to extract emotion-related features from a piece of voice information,and then train these extracted features with specific models to obtain the emotional result labels of each piece of speech.On the basis of the background of the problem,this paper make the following research:1.Extraction of speech features.The disadvantage of MFCC features is that they ignore the high correlation between features.The correlation between the features can be better extracted by the network structure of deep learning,making the model better trained.In view of the shortcomings of MFCC,The feature of log fbank is used.Log fbank preserves the correlation between features better and is more suitable for the neural network-based model.The result is better than MFCC.Both deltas and delta-deltas are used to retain the dynamic information in the spectrum to further improve the recognition accuracy of the model.And try to test the different combination of phonetic features to find the way to get the optimal experimental results.On this basis,CNN convolution is further studied to obtain deeper features,and it shows better performance than the original experiment.2.Study of Self-attention Mechanism.voice as input also has its own timing information.For a certain frame of speech,there will be a certain frame or several frames of speech before or after the filter feature has a certain effect on the frame.For such a structure,this paper proposes self-attention mechanism model to train the features.Self-attention mechanism is used to grab relevant weight information for time sequence information,and RNN unit is abandoned,so the parallelism of GPU can be greatly used to improve the speed of training.At the same time,some innovations are made on the basis of the model.By referring to the method of learning the word vectorby the model in NLP field,the position vector matrix algorithm is used to obtain the relative position information of the sequence.Furthermore,the information acquired by the attention mechanism is further convoluted in one dimension.3.Study of Capsule Network.Capsule network and attention mechanism are similar in principle.They both calculate the representation of each layer.The probability of the existence of capsule network can be equal to the weight of attention mechanism.The paper studies their similarities and differences.The capsule network is considered as a special form of attention,this paper studies the effect of capsule network on emotion recognition and the results showed that the capsule network performed well in emotion recognition.
Keywords/Search Tags:Speech Emotion, Self-attention, Log fbank, Capsule Network, Neural etwork
PDF Full Text Request
Related items