Font Size: a A A

Research And Implementation Of Speech Emotion Recognition Based On Transfer Learning

Posted on:2021-04-25Degree:MasterType:Thesis
Country:ChinaCandidate:Y F XueFull Text:PDF
GTID:2428330629487263Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In people's daily interactions,emotions often play a very important role,helping people understand state of mind and behavior each other.Likewise,this information is critical to sustain long-term interaction between humans and machines.Automatic speech emotion recognition has been explored by researchers as a way to bridge the communication gap between humans and computers.In the traditional speech emotion recognition methods,although the speech emotion recognition methods trained and tested on the same dataset have been proved to be effective,they are often not satisfactory when applied to the datasets that have not existed in training sets.Because the speech data collected from different devices or environments have gaping differences in terms of language,type of emotion(e.g.,acted,elicited,or spontaneous)and labeling scheme.At this point,the training and test set have different data distributions,and the traditional SER approaches cannot deal with this problem well.As a learning method to solve the inconsistency of data distribution,transfer learning has been widely used in speech recognition,image processing,video analysis and other fields.Based on transfer learning technology,this thesis studies the methods of multilingual and cross-corpus speech emotion recognition.The specific research contents are as follows:(1)A method of multilingual speech emotion recognition based on multi-task attention is proposed.Aiming at the low performance of multilingual speech emotion recognition,a multilingual speech emotion recognition method based on multi-task attention is proposed.By introducing the auxiliary task of language identification,the model can not only learn the emotion features shared by different languages,but also learn the unique emotion characteristics of each language,so as to improve the generalization ability of the multilingual emotion recognition model.Experiments on the dimensional affective corpora of two languages show that the proposed method improves the mean values of relative UAR of the valence and arousal tasks by 3.66%~5.58% and 1.28%~6.51%,respectively,compared with the benchmark methods.Experiments on discrete affective corpora of four languages show that the mean values of relative UAR are improved by 13.43%~15.75% compared with the benchmark methods.Therefore,the proposed method can effectively extract the language-related emotion features and improves the performance of multilingual emotion recognition.(2)This thesis proposes a cross-corpus speech emotion recognition method based on adversarial training.Aiming at the low recognition rate of cross-corpus speech emotion recognition caused by the inconsistent distribution of training set and test set,a cross-corpus speech emotion recognition method based on adversarial training is proposed.The proposed method can effectively eliminate the differences between different corpora with the adversarial training of corpora,and improve the extracting ability of domain-invariant emotion features.At the same time,model the relative dependence of different position elements in the speech sequence to enhance the emotion-salient features extracting ability of the sequence by introducing the multihead attention mechanism.When the experiment applies IEMOCAP as the source domain and MSP-IMPRO as the target domain,the results are superior to the benchmark methods about 0.91%~12.22%.Meanwhile,the experiment applies MSPIMPRO as the source domain and IEMOCAP as the target domain,the results also achieve better performance than the benchmark methods about 2.27%~6.89%.Therefore,in the case of the absence of emotion labels of the target domain,the proposed cross-corpus speech emotion recognition method is more beneficial to extracting domain-invariant emotion salient features.(3)Design and implement a prototype system of speech emotion recognition based on transfer learning.A speech emotion recognition prototype system based on transfer learning is designed and implemented by using Python programming language,PyQt5 user interface design tool,Keras and PyTorch deep learning framework.The system consists of three parts: acoustic feature extraction and analysis module,multilingual speech emotion recognition module based on multi-task attention and cross-corpus speech emotion recognition module based on adversarial training.Among them,the multilingual speech emotion recognition method and cross-corpus speech emotion recognition method proposed in this thesis are realized and verified in the prototype system.The prototype system can intuitively demonstrate and verify the usability and effectiveness of the proposed method.
Keywords/Search Tags:Speech emotion recognition, Multi-task learning, Domain adaptation, Convolutional neural network, Recurrent neural network
PDF Full Text Request
Related items