Font Size: a A A

Reasearch On Cross Corpus Speech Emotion Recognition Based On Domain Adversarial Training

Posted on:2022-04-30Degree:MasterType:Thesis
Country:ChinaCandidate:W L ZhengFull Text:PDF
GTID:2518306740979879Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
Speech is one of the commonly used means in communication,which contains rich emotional information.One of the major challenges of human-computer interaction is predicting the emotional state of human by their speech data.Speech emotion recognition(SER)is the process that automatically recognizes emotions conveyed by speech data.It is one of the critical issues in the field of pattern recognition and affective computing.With the interference of environmental noise,speakers' identity and language,it is hard to represent the emotional information in speech signals,which as a result restricts the generalization of the speech emotion recognition system.In this regard,the cross-corpus speech emotion recognition proposes to employ different database for training and testing models to improve the generalization ability in the wild.To minimize the discrepancy between different corpus,combining domain adaptation and deep learning,in this article we carry out in-depth research on key issue of cross-corpus SER,i.e.,cross corpus feature alignment.The major contributions are as followed.(1)We propose a novel Global Local Adversarial Network(GLAN)for extracting discriminative and generalized speech emotional features and model SER problem in view of sequential patterns.We propose a novel feature extraction method based on global,local and hybrid timescales blending the merits of hand craft features and deep level features.We also select emotion related part in speech signals based on attention network for discriminative speech features.In addition,for obtaining generalized speech features,domain discriminators of hierarchical levels are brought into the emotion recognition framework to mitigate the gap between source domain and target domain in global,local and hybrid levels.(2)We propose a cross corpus SER method based on Conditional Adversarial Domain Adaptation.In order to eliminate the emotional speech feature differences cross database and ensure discriminability of features,based on work(1)the method introduces feature representation and predict information into domain adaptation and effectively catch the interaction between them.Specifically,a conditional discriminator is introduced to distinguish the cross-covariance of speech features and emotion prediction information between source and target domains.Emotion prediction network predicts emotion categories from speech features for capturing discriminative emotional speech features.The two modules are trained cooperatively in a competitive manner.This method uses the correlation between speech features and predicted label information to characterize the structure of speech emotion categories,and achieves more accurate domain feature distribution matching.(3)We build a speech emotion recognition system which has the functions such as playing speech audio,speech feature extraction,and speech emotion recognition.It can play speech data,display extracted spectrogram features and the recognition result of speech emotion.
Keywords/Search Tags:speech emotion recognition, domain adaptation, adversarial training, deep neural network
PDF Full Text Request
Related items