Speech Feature Encoding And Emotion Recognition Based On Auto Encoder

Posted on:2021-04-19

Degree:Master

Type:Thesis

Country:China

Candidate:X Z Zhong

Full Text:PDF

GTID:2428330629451049

Subject:Communication and Information System

Abstract/Summary:

In recent year,there has a trend in the field of speech related research that in many specific theses the temporal convolution network(TCN)outperformed the recurrent neural network(RNN)based module and achieved remarkable breakthrough in speech synthesis but still leaved quite a gap in speech emotion recognition.On the other hand,the present research still needed some great progress to reach the level of practical application.Considering the overall performance,a method for compressing feature set and improve the training and processing speech of classifier is needed.It's also necessary for pushing the research of speech emotion recognition into the stage of representation learning.The primary works in this paper including following two:(1)Considering the outstanding performance of temporal convolution network and have been proven to be capable to effectively capturing long distant dependence in long sequence like speech,a novel method for extracting information from audio signal which combined temporal convolution network and auto-encoder,a module widely used in features extracting and processing and representation learning is proposed to extract new information that not contained in traditional feature set to achieve better performance.(2)For the problem of features processing,the adversarial auto-encoder(AAE)is applied in combining traditional features and new features and learn a new representation of original features.New features should have it's dimension as low as possible while still contains information distinguishable for classification.By forcing the original features into a preset prior distribution,the adversarial auto-encoder could learn new representation of original data while keeping it's discrimination.Experiment result showed that thanks to temporal convolution network,new information that traditional feature set didn't cover is extracted.New method achieved 76.6% in unweighted average recall on the data set of RAVDESS.On the other hand,the adversarial auto encoder compressed hybrid feature of traditional one and new feature which have a dimension of 434 to 8 but still achieved 68.0% in unweighted average recall.

Keywords/Search Tags:

Auto-Encoder, Adversarial Auto-encoder, Temporal Convolution Network, Speech Emotion Recognition, Speech Feature Extraction

Related items

1	Deep Auto-encoder Framework For SAR Images Change Detection
2	Objective Evaluation Of Speech Quality Based On Stacked Auto-Encoder
3	Study Of Statistical Process Monitoring Method Based On Auto-Encoder
4	Research On Feature Fusion Method Of Speech Emotion Recognition Based On Deep Learning
5	Single-channel Speech Enhancement Based On Analysis-by-synthesis
6	A Hybrid Depth Network Learning Model Based On Auto-encoders
7	Reasearch Into Speech Recognition Based On Deep Learning
8	DCGAN Image Generation Algorithm Based On Coded Feature Extraction And Application
9	Adversarial Auto-encoder For Open Set Recognition
10	Research On Any-to-any Emotional Voice Conversion Based On Variational Auto-encoder