Font Size: a A A

Research On Cross-corpus Speech Emotion Recognition Technology Based On Transfer Learning

Posted on:2024-05-22Degree:MasterType:Thesis
Country:ChinaCandidate:M ZhaoFull Text:PDF
GTID:2568307136493304Subject:Electronic information
Abstract/Summary:PDF Full Text Request
Speech emotion recognition technology helps to realize natural human-computer interaction.In order to obtain an emotion recognition system with higher language generalization performance,this thesis studies the impact of multilingual corpus on speech emotion recognition performance.Starting with multilingual and cross language emotion recognition,this thesis combines the idea of transfer learning in deep network,a multilingual speech emotion recognition system based on multi task residual expansion causal convolutional network and a cross-corpus speech emotion recognition method based on deep domain adaptive CNN decision tree have been proposed.The research content and innovative points of this article are as follows:(1)Firstly,this article introduces the research background and current research status in the field of speech emotion recognition at home and abroad.Then,it summarizes traditional speech emotion recognition systems and introduces each module in detail,including emotion databases,speech signal preprocessing techniques,traditional acoustic features and feature preprocessing methods.Finally,the related theories of deep learning and transfer learning and their applications in emotion recognition are elaborated,laying a theoretical foundation for subsequent research work.(2)In the process of multilingual speech emotion recognition,differences in speakers and other factors can affect the results of speech emotion recognition.In order to improve the generalization performance of multilingual speech emotion recognition systems,this thesis proposes a multilingual speech emotion recognition system based on multi task residual expansion causal convolutional network.This article focuses on emotion classification as the main task,adding two auxiliary tasks:language classification and gender classification.The three tasks learn together,fully utilizing the common and complementary information between tasks,and providing information that is conducive to emotion classification for the main task.In addition,considering the long-term trend of emotional states contained in speech,in the task sharing layer,this thesis proposes a residual extended causal convolutional network that utilizes three extended causal convolutional blocks to extract multi-scale emotional features and reuse them through residual connections,achieving effective modeling of long-term context of emotional features.The experimental results show that the multilingual speech emotion recognition method based on multi task residual expansion causal convolutional network can effectively improve the performance of speech emotion recognition.(3)In cross-corpus speech emotion recognition,the mismatch between target domain and source domain samples leads to poor performance of emotion recognition.In order to improve the cross-corpus speech emotion recognition performance,this thesis proposes a cross-corpus speech emotion recognition method based on deep domain adaptation and Convolutional Neural Network decision tree model.Firstly,a local feature transfer learning network based on joint constrained deep domain adaptation is constructed.By minimizing the joint difference between the target and source domains in the feature space and Hilbert space,the correlation between the two corpora is mined and the transferable invariant features from the target domain to the source domain are learned.Then,in order to reduce the classification error of confusable emotions among multiple emotions in the cross-corpus context,a CNN decision tree multi-level classification model is constructed based on the emotional confusion degree,and multiple emotions are first coarsely classified and then finely classified.The experimental results show that compared with the CNN baseline method,the performance of the cross-corpus speech emotion recognition method in this thesis is greatly improved.
Keywords/Search Tags:Cross-corpus speech emotion recognition, Transfer learning, Multi task learning, Deep domain adaptation, Decision tree model
PDF Full Text Request
Related items