Font Size: a A A

Speech Feature Encoding And Emotion Recognition Based On Auto Encoder

Posted on:2021-04-19Degree:MasterType:Thesis
Country:ChinaCandidate:X Z ZhongFull Text:PDF
GTID:2428330629451049Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
In recent year,there has a trend in the field of speech related research that in many specific theses the temporal convolution network(TCN)outperformed the recurrent neural network(RNN)based module and achieved remarkable breakthrough in speech synthesis but still leaved quite a gap in speech emotion recognition.On the other hand,the present research still needed some great progress to reach the level of practical application.Considering the overall performance,a method for compressing feature set and improve the training and processing speech of classifier is needed.It's also necessary for pushing the research of speech emotion recognition into the stage of representation learning.The primary works in this paper including following two:(1)Considering the outstanding performance of temporal convolution network and have been proven to be capable to effectively capturing long distant dependence in long sequence like speech,a novel method for extracting information from audio signal which combined temporal convolution network and auto-encoder,a module widely used in features extracting and processing and representation learning is proposed to extract new information that not contained in traditional feature set to achieve better performance.(2)For the problem of features processing,the adversarial auto-encoder(AAE)is applied in combining traditional features and new features and learn a new representation of original features.New features should have it's dimension as low as possible while still contains information distinguishable for classification.By forcing the original features into a preset prior distribution,the adversarial auto-encoder could learn new representation of original data while keeping it's discrimination.Experiment result showed that thanks to temporal convolution network,new information that traditional feature set didn't cover is extracted.New method achieved 76.6% in unweighted average recall on the data set of RAVDESS.On the other hand,the adversarial auto encoder compressed hybrid feature of traditional one and new feature which have a dimension of 434 to 8 but still achieved 68.0% in unweighted average recall.
Keywords/Search Tags:Auto-Encoder, Adversarial Auto-encoder, Temporal Convolution Network, Speech Emotion Recognition, Speech Feature Extraction
PDF Full Text Request
Related items