Font Size: a A A

Speech Emotion Recognition Based On Deep Separable Convolution And Cross Corpus

Posted on:2022-04-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhongFull Text:PDF
GTID:2518306539498414Subject:Engineering
Abstract/Summary:PDF Full Text Request
One of the major challenges in Speech Emotion Recognition(SER)is to build a lightweight model with limited training data.In this paper,we propose a lightweight architecture with only fewer parameters which is based on separable convolution and inverted residuals.Speech samples are often annotated by multiple raters.While some sentences with clear emotional content are consistently annotated(easy samples),sentences with ambiguous emotional content present important disagreement between individual evaluations(hard samples).We assumed that samples hard for humans are also hard for computers.We address the problem by using Focal loss,which focus on learning hard samples and down-weight easy samples.By combining attention mechanism,our proposed network can enhance the importing of emotion-salient information.Our proposed model achieves 71.72% and 90.1% of unweighted accuracy(UA)on the well-known corpora IEMOCAP and Emo-DB respectively.Comparing with the current model having fewest parameters as we know,its model size is almost 5 times of our proposed model.The main issue of cross-corpus speech emotion recognition is that samples from different databases differ greatly in the distribution of feature space.We propose a cross corpus model for speech emotion recognition based on effective channel attention,which can establish the interdependence between feature channels for better feature representation.Although samples from different databases have great differences in feature space distribution,samples of the same category still have similarities in feature space.In this paper,the center loss function is combined to make the space distance of the same category features closer,and the distance of different categories farther,which improves the recognition ability of the model.Due to the huge differences between speech emotion databases,the performance of cross-corpus speech emotion recognition is poor.In order to improve the generalization ability of cross corpus speech emotion recognition model,incremental learning method is adopted to train the model.The method used in this paper achieves a great improvement in speech emotion recognition across the three databases of IEMOCAP,MSP-IMPROV and MES-P.
Keywords/Search Tags:Speech emotion recognition, Lightweight, Effective channel attention, Cross-corpus, Incremental learning
PDF Full Text Request
Related items