Deep Learning Models For Speech Emotion Recognition

Posted on:2021-03-03

Degree:Master

Type:Thesis

Country:China

Candidate:H Y Zhang

Full Text:PDF

GTID:2428330620975887

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Emotional intelligence is particularly important in human activities.Determining the emotional category is the core of emotional intelligence.Generally,the same semantic content may express different emotions,and different speakers express their emotions in different ways.Understanding semantic information alone is not enough to make the computer fully understand the speaker's intention.In order to make the computer fully understand the speaker's intentions,it is necessary to make the computer have emotional intelligence.The purpose of speech emotion recognition is to use computer to extract the features that can best represent emotion from speech,and to determine the emotion category of the speaker according to these features,so as to better realize human-computer interaction.The research of speech emotion recognition mainly faces the following problems:(1)the lack of unified database construction standards;(2)the lack of features that can best represent speech emotion;(3)the generalization and robustness of acoustic model are poor.In view of the above problems and the advantages of various neural networks,the contributions of this study are as follows:(1)a new acoustic model of speech emotion recognition is constructed by combining the recurrent neural network,convolution neural network and deep residual network.Recurrent neural network is used to process temporal information,convolution neural network captures spatial information,and the deep residual network solves the problem of gradient explosion or gradient vanishing;(2)attention mechanism and mask operation are introduced into the acoustic modeling of neural network.The attention mechanism is used to focus on the regions of emotional highlights and the mask operation is used to extract the regions of interest in the speech;(3)four new deep learning models are proposed.Attention mechanism based advanced long-short term memory network(AA-LSTM),attention mechanism based convolution bi-directional long-short term memory network(CBAM),attention mechanism based skip convolution bi-directional long-short term memory network(SCBAM),and attention mechanism based skip convolution bi-directional long-short term memory network with masking operations(SCBAMM);(4)the speech is converted into spectrum,and the proposed four new deep learning models are used to extract the 34-dimensional deep learning features and the 2-dimensional manual features such as harmonic noise ratio and pitch,and the combination of spectral features and speech acoustic features is used as the input of the acoustic model;(5)the performance of the four new deep learning models is verified on the EMO-DB database in this study.Experiments show that the four deep learning models proposed in this study: AA-LSTM,CBAM,SCBAM and SCBAMM,have achieved 70.09%,56.07%,64.49% and 72.09% recognition performance respectively on EMO-DB emotional speech database,it can be seen that SCBAMM model achieves the optimal classification.Besides,compared with other researchers' classification models,SCBAMM model also achieves the optimal performance.This is because SCBAMM model not only effectively extracts the features of time-frequency domain that can best represent emotion,but also combines the advantages of recurrent neural network,convolutional neural network and deep residual network,which has strong modeling ability.

Keywords/Search Tags:

Speech emotion recognition, feature extraction, attention mechanism, deep neural network

PDF Full Text Request

Related items

1	Dual Fusion Speech Emotion Recognition Based On Deep Learning
2	Research On Feature Fusion Method Of Speech Emotion Recognition Based On Deep Learning
3	Speech Emotion Recognition Based On Deep Learning
4	Research On Speech Emotion Recognition Model Based On Deep Neural Network
5	Speech And Facial Double Model Emotion Recognition
6	Speech Emotion Recognition Based On Deep Learning Technology
7	Speech Emotion Recognition Based On Multi-feature Combination And Attention Mechanism
8	Research On Unspecified Person Speech Emotion Recognition Based On Neural Network
9	Research On Speech Emotion Recognition Algorithm Based On Deep Learning
10	The Research Of Speech Emotion Recognition Algorithm Based On Imporoved Attention Mechanism