Font Size: a A A

Research On Feature Extraction And Classification Of Speech Emotion

Posted on:2020-06-23Degree:MasterType:Thesis
Country:ChinaCandidate:W W WangFull Text:PDF
GTID:2428330623957516Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Emotion recognition is a research hotspot in the fields of multimedia information processing,pattern recognition and computer vision.With the development of deep learning and artificial intelligence,emotion recognition is the key to human-computer interaction,and it has received extensive attention from researchers.There are many ways to express emotions.Among them,facial expression and speech are the two most important emotional carriers.The research on bimodal emotion recognition based on facial expression and speech has important practical significance.This paper focuses on facial modalities and speech modalities,and studies the speech emotion features used for classification,as well as the application of machine learning and deep learning in speech emotion recognition.main tasks as follows:(1)In order to improve the performance of emotion feature recognition in traditional speech emotion recognition,this paper proposes a speech emotion recognition method based on variational mode decomposition(VMD).The sentiment speech signal is first extracted by the VMD from the intrinsic mode function(IMF),then the selected dominant IMF is re-aggregated,and the MEL cepstral coefficient(MFCC)and the hilbert marginal spectrum of each IMF are extracted.In order to verify the feature performance proposed in this paper,two kinds of speech databases(EMODB,RAVDESS)were selected for experiments.The features were extracted according to the method of this paper,and the emotional learning classification was performed using the extreme learning machine(ELM).The experimental results show that compared with EMD and EEMD.The characteristics of speech emotions,the features proposed in this paper have better recognition performance,and verify the practicability of the method.(2)In order to improve the accuracy of speech emotion recognition,this paper optimizes the self-attention mechanism,and proposes a speech emotion recognition method based on multi-layer self-attention mechanism for bidirectional long-term memory neural network.Classification;speech signal as a time series has strong correlation in time.In order to utilize the correlation between the speech sequence before and after the speechsequence,a speech emotion recognition method based on two-way long-term memory network is studied.According to the two-way length The advantages of time memory network and convolutional neural network are presented.A speech emotion recognition method based on bidirectional long-term time memory network and deep scattering spectrum features is proposed.(3)Taking the two modes of facial expression and speech as the research object,the feature fusion algorithm including kernel typical correlation analysis,kernel matrix fusion and nuclear crossover model factor analysis and weighted decision fusion algorithm are analyzed and compared.The bimodal emotional dataset based on SAVEE database was selected for experimental verification.The experimental results show that the bimodal emotion recognition results obtained by the fusion method are significantly improved compared with the single-modal emotion recognition results.
Keywords/Search Tags:bimodal emotion recognition, VMD, speech emotion recognition, cyclic neural network, feature fusion
PDF Full Text Request
Related items