Speech Emotion Recognition Based On Deep Separable Convolution And Cross Corpus

Posted on:2022-04-28

Degree:Master

Type:Thesis

Country:China

Candidate:Y Zhong

Full Text:PDF

GTID:2518306539498414

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

One of the major challenges in Speech Emotion Recognition(SER)is to build a lightweight model with limited training data.In this paper,we propose a lightweight architecture with only fewer parameters which is based on separable convolution and inverted residuals.Speech samples are often annotated by multiple raters.While some sentences with clear emotional content are consistently annotated(easy samples),sentences with ambiguous emotional content present important disagreement between individual evaluations(hard samples).We assumed that samples hard for humans are also hard for computers.We address the problem by using Focal loss,which focus on learning hard samples and down-weight easy samples.By combining attention mechanism,our proposed network can enhance the importing of emotion-salient information.Our proposed model achieves 71.72% and 90.1% of unweighted accuracy(UA)on the well-known corpora IEMOCAP and Emo-DB respectively.Comparing with the current model having fewest parameters as we know,its model size is almost 5 times of our proposed model.The main issue of cross-corpus speech emotion recognition is that samples from different databases differ greatly in the distribution of feature space.We propose a cross corpus model for speech emotion recognition based on effective channel attention,which can establish the interdependence between feature channels for better feature representation.Although samples from different databases have great differences in feature space distribution,samples of the same category still have similarities in feature space.In this paper,the center loss function is combined to make the space distance of the same category features closer,and the distance of different categories farther,which improves the recognition ability of the model.Due to the huge differences between speech emotion databases,the performance of cross-corpus speech emotion recognition is poor.In order to improve the generalization ability of cross corpus speech emotion recognition model,incremental learning method is adopted to train the model.The method used in this paper achieves a great improvement in speech emotion recognition across the three databases of IEMOCAP,MSP-IMPROV and MES-P.

Keywords/Search Tags:

Speech emotion recognition, Lightweight, Effective channel attention, Cross-corpus, Incremental learning

PDF Full Text Request

Related items

1	Research On Several Key Technologies In Cross-corpus Speech Emotion Recognition
2	Research On Cross-corpus Speech Emotion Recognition Based On Target Adaptation
3	Cross-corpus Speech Emotion Recognition Based On VGFCC Feature And Composite Network
4	Research On Key Techniques Of Speech Emotion Recognition
5	Research On Feature Fusion Method Of Speech Emotion Recognition Based On Deep Learning
6	Research On Transfer Subspace Learning For Speech Emotion Recognition
7	Study On Attention Based Speech Emotion Recognition
8	Research On Speech Emotion Recognition Based On Deep Learning
9	Research On Key Technologies Of Speech Emotion Recognition
10	Research On Speech Emotion Recognition Technology Based On Deep Learning