Font Size: a A A

Research And Implementation Of Speech Emotion Recognition Based On CGRU Model

Posted on:2021-08-01Degree:MasterType:Thesis
Country:ChinaCandidate:B FuFull Text:PDF
GTID:2558306923950029Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
Speech emotion recognition is a very important research direction in emotion computing and human-computer interaction.The emotion recognition system detects the speaker’s speech,extracts the features related to the speech and emotion,and realizes the emotion recognition by distinguishing different emotion features.Traditional research methods are extracted from speech don’t depend on the speaker or the acoustic characteristics of speech content,with the development of deep learning,the depth of the currently widely used neural network to extract the higher-order characteristics of voice,these advanced features to improve the precision of speech emotion recognition,but at the moment for using what kind of neural network model and how to solve the problem of lack of data remains to be further research.This paper studies these problems,and the main research contents are as follows:First of all,based on the theory of acoustic extracted from speech,including MFCC,MEL frequency spectrum,pitch,a variety of acoustic characteristics,respectively,using a single fusion manner of MFCC features as well as a variety of speech emotion recognition,at the same time after multiple feature fusion using random forests for multidimensional characteristics for feature selection,the experimental study shows that multiple feature fusion way relative to the single acoustic features higher recognition accuracy,multiple feature fusion with random way to filter out the emotional related characteristics of forest,on the basis of reduced the feature dimension,benefit to decrease the training time and improve the recognition rate of recognition model.Second,according to the characteristics of the traditional way to extract is highly dependent on corpus,characteristic generalization ability is insufficient,and the feature extraction step tedious,time-consuming,and needs a lot of problem such as acoustic knowledge,introduce the hierarchical learning in the modeling of speech emotion recognition system,by combining a one-dimensional convolution and gating cycle unit(GRU helped)approach,this paper designed a CGRU high-order model is used to extract the speech features.The function of one-dimensional convolution is to extract local features and abstract features layer by layer,while the function of GRU is to learn the long-term dependence between feature sequences extracted by one-dimensional convolution,so as to obtain global features related to emotion.CGRU model combines the advantages of CNN and GRU model in emotion recognition,and experiments are carried out on three different types of emotion corpus.The experimental results show that compared with the traditional benchmark model,the accuracy of CGRU model in emo-db,SAVEE and RAVDESS data sets is improved by 5%,3%and 5%respectively.At the same time,the CGRU model was used for feature extraction of samples,combined with random forest for dimension reduction,and the training method using support vector machine improved the accuracy of CGRU model on the data set by 2%on average.Finally,a great demand for CGRU model for data and the problem of insufficient data sample,this paper introduces the expansion technology of speech samples,based on the speech samples add noise,speed,including expanding the number of samples,on the expansion of samples after retraining CGRU model,improve the effect of the identification of the model,through constructing on-line identification system to verify the effectiveness of the CGRU model.
Keywords/Search Tags:speech emotion recognition, mel frequency cepstrum coefficient, random forest, CGRU model, data augmentation
PDF Full Text Request
Related items