Font Size: a A A

Research On Speech Emotion Recognition Based On Deep Learning

Posted on:2021-03-08Degree:MasterType:Thesis
Country:ChinaCandidate:S Z LiFull Text:PDF
GTID:2428330611966444Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Speech emotion recognition(SER)is one of the important technologies of humancomputer interaction technology.It is also one of the hot research in speech information processing.What's more,speech emotion recognition has broad application prospects in many fields,such as medical treatment and education.Although deep learning has brought breakthrough in speech emotion recognition in recent years,how to quickly and effectively extract emotion features and alleviate data imbalance and domain adaptation remains a key technical issue in SER.Based on deep learning technology,this paper carry out researches on SER in terms of these key issues.The main researches of this paper are as follows:(1)Aiming at how to quickly and effectively extract the emotional features from a long spectrogram,this paper proposes a speech emotion recognition based on spatiotemporalfrequential cascaded attention mechanism.As a novel lightweight attention mechanism,spatiotemporal-frequential cascaded attention is composed of spatiotemporal attention and frequential attention.Given a long speech spectrogram,spatiotemporal attention adaptively locates proposal emotion regions.In these proposal regions,frequential attention captures emotion features by frequency distribution.Spatiotemporal attention and frequential attention cooperate with each other,and help neural networks to extract emotional features from a long speech spectrogram.Experiments on four public datasets demonstrate that our proposed model is very competitive to the latest works and the influential works in weighted accuracy and unweighted accuracy.(2)In terms of data imbalance and domain adaptation of multiple databases,this paper proposes a speech emotion recognition based on center transfer.In order to solve data imbalance of source database,this paper adds several auxiliary databases to the source database for keeping data balanced.In speech emotion recognition based on center transfer,Domainadversarial neural network constrains the feature distribution alignment of source domain and auxiliary domain,and a novel center transfer network constrains the transfer of auxiliary domain centers to source domain centers in feature distribution.Center transfer network remains the same feature distribution of the training data from multiple databases and the test data from source database,so that the auxiliary databases can effectively alleviate data imbalance of source database.Experiments on IEMOCAP database demonstrate that our proposed model improves the accuracy of each weak class and is better than the variable-length model,which verifies the effectiveness of our model in alleviating data imbalance.Speech emotion recognition based on spatiotemporal-frequential cascaded attention and center transfer is verified by quantitative analysis,qualitative analysis,ablation analysis and visual analysis on four public datasets.
Keywords/Search Tags:speech emotion recognition, deep learning, spatiotemporal-frequential cascaded attention, center transfer network
PDF Full Text Request
Related items