Research On Speech Emotion Recognition Based On Deep Learning

Posted on:2021-03-08

Degree:Master

Type:Thesis

Country:China

Candidate:S Z Li

Full Text:PDF

GTID:2428330611966444

Subject:Signal and Information Processing

Abstract/Summary:

PDF Full Text Request

Speech emotion recognition(SER)is one of the important technologies of humancomputer interaction technology.It is also one of the hot research in speech information processing.What's more,speech emotion recognition has broad application prospects in many fields,such as medical treatment and education.Although deep learning has brought breakthrough in speech emotion recognition in recent years,how to quickly and effectively extract emotion features and alleviate data imbalance and domain adaptation remains a key technical issue in SER.Based on deep learning technology,this paper carry out researches on SER in terms of these key issues.The main researches of this paper are as follows:(1)Aiming at how to quickly and effectively extract the emotional features from a long spectrogram,this paper proposes a speech emotion recognition based on spatiotemporalfrequential cascaded attention mechanism.As a novel lightweight attention mechanism,spatiotemporal-frequential cascaded attention is composed of spatiotemporal attention and frequential attention.Given a long speech spectrogram,spatiotemporal attention adaptively locates proposal emotion regions.In these proposal regions,frequential attention captures emotion features by frequency distribution.Spatiotemporal attention and frequential attention cooperate with each other,and help neural networks to extract emotional features from a long speech spectrogram.Experiments on four public datasets demonstrate that our proposed model is very competitive to the latest works and the influential works in weighted accuracy and unweighted accuracy.(2)In terms of data imbalance and domain adaptation of multiple databases,this paper proposes a speech emotion recognition based on center transfer.In order to solve data imbalance of source database,this paper adds several auxiliary databases to the source database for keeping data balanced.In speech emotion recognition based on center transfer,Domainadversarial neural network constrains the feature distribution alignment of source domain and auxiliary domain,and a novel center transfer network constrains the transfer of auxiliary domain centers to source domain centers in feature distribution.Center transfer network remains the same feature distribution of the training data from multiple databases and the test data from source database,so that the auxiliary databases can effectively alleviate data imbalance of source database.Experiments on IEMOCAP database demonstrate that our proposed model improves the accuracy of each weak class and is better than the variable-length model,which verifies the effectiveness of our model in alleviating data imbalance.Speech emotion recognition based on spatiotemporal-frequential cascaded attention and center transfer is verified by quantitative analysis,qualitative analysis,ablation analysis and visual analysis on four public datasets.

Keywords/Search Tags:

speech emotion recognition, deep learning, spatiotemporal-frequential cascaded attention, center transfer network

PDF Full Text Request

Related items

1	Reasearch On Cross-corpus Speech Emotion Recognition Based On Progressive Distribution Adaption And Emotion Discriminability Ehancement
2	Research On Speech Emotion Recognition Based On Multi-Attention Mechanism And Multi-Task Learning
3	Speech Emotion Recognition Based On Deep Learning Technology
4	Design And Implementation Of Speech Emotion Recognition Algorithm Based On Deep Learning
5	Research On Speech Emotion Recognition Method Based On Time Series Deep Learning Model
6	Research On Speech Emotion Recognition Based On Spatiotemporal Feature Fusion
7	Research On Speech Emotion Recognition Technology Based On Deep Learning
8	Research On Speech Emotion Recognition Methods Based On Deep Learning And Transfer Learning
9	Cross-corpus Speech Emotion Recognition Based On Transfer Learning
10	Research On Feature Fusion Method Of Speech Emotion Recognition Based On Deep Learning