Reasearch On Cross Corpus Speech Emotion Recognition Based On Domain Adversarial Training

Posted on:2022-04-30

Degree:Master

Type:Thesis

Country:China

Candidate:W L Zheng

Full Text:PDF

GTID:2518306740979879

Subject:Biomedical engineering

Abstract/Summary:

PDF Full Text Request

Speech is one of the commonly used means in communication,which contains rich emotional information.One of the major challenges of human-computer interaction is predicting the emotional state of human by their speech data.Speech emotion recognition(SER)is the process that automatically recognizes emotions conveyed by speech data.It is one of the critical issues in the field of pattern recognition and affective computing.With the interference of environmental noise,speakers' identity and language,it is hard to represent the emotional information in speech signals,which as a result restricts the generalization of the speech emotion recognition system.In this regard,the cross-corpus speech emotion recognition proposes to employ different database for training and testing models to improve the generalization ability in the wild.To minimize the discrepancy between different corpus,combining domain adaptation and deep learning,in this article we carry out in-depth research on key issue of cross-corpus SER,i.e.,cross corpus feature alignment.The major contributions are as followed.(1)We propose a novel Global Local Adversarial Network(GLAN)for extracting discriminative and generalized speech emotional features and model SER problem in view of sequential patterns.We propose a novel feature extraction method based on global,local and hybrid timescales blending the merits of hand craft features and deep level features.We also select emotion related part in speech signals based on attention network for discriminative speech features.In addition,for obtaining generalized speech features,domain discriminators of hierarchical levels are brought into the emotion recognition framework to mitigate the gap between source domain and target domain in global,local and hybrid levels.(2)We propose a cross corpus SER method based on Conditional Adversarial Domain Adaptation.In order to eliminate the emotional speech feature differences cross database and ensure discriminability of features,based on work(1)the method introduces feature representation and predict information into domain adaptation and effectively catch the interaction between them.Specifically,a conditional discriminator is introduced to distinguish the cross-covariance of speech features and emotion prediction information between source and target domains.Emotion prediction network predicts emotion categories from speech features for capturing discriminative emotional speech features.The two modules are trained cooperatively in a competitive manner.This method uses the correlation between speech features and predicted label information to characterize the structure of speech emotion categories,and achieves more accurate domain feature distribution matching.(3)We build a speech emotion recognition system which has the functions such as playing speech audio,speech feature extraction,and speech emotion recognition.It can play speech data,display extracted spectrogram features and the recognition result of speech emotion.

Keywords/Search Tags:

speech emotion recognition, domain adaptation, adversarial training, deep neural network

PDF Full Text Request

Related items

1	Research And Application Of Unsupervised Domain Adaptation Algorithm Based On Adversarial Training
2	Speech Emotion Recognition Via Domain Adaptation
3	Research And Implementation Of Speech Emotion Recognition Based On Transfer Learning
4	Speaker Adaptation Of DNN-HMM Acoustic Model For Speech Recognition
5	The Research Of Cross-user Activity Recognition Based On Deep Learning And Unsupervised Domain Adaptation
6	Research On Key Technologies Of Speech Emotion Recognition
7	Research On Speech Emotion Recognition Model Based On Deep Neural Network
8	Research On Speech Emotion Recognition Based On Deep Neural Network
9	Research Of Speech Emotion Recognition Based On Deep Neural Network
10	Research On Feature Fusion Method Of Speech Emotion Recognition Based On Deep Learning