Font Size: a A A

Design And Implementation Of Speech Emotion Recognition Algorithm Based On Deep Learning

Posted on:2022-06-08Degree:MasterType:Thesis
Country:ChinaCandidate:H L WuFull Text:PDF
GTID:2518306320990279Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of technology and artificial intelligence,users' demand for human-computer interaction is increasing.As emotional information is the basis of communication,humans hope that machines can sense emotions so as to provide consumers with better services.As a key technology of human-computer interaction,speech emotion recognition has practical significance in many fields such as medical treatment and education.Therefore,this paper conducts research on the topic of speech emotion recognition from the perspectives of optimizing speech features and building a good emotion recognition model.The main work is as follows:In the aspect of feature extraction,MEL cepstrum feature parameters were used.In view of the poor resolution of medium and high frequency,I-MFCC and Mid-MFCC feature parameters were introduced,and the contribution of three MEL cepstrum coefficients to speech emotion recognition was calculated by the Fisher ratio criterion of dimension reduction algorithm.After the fusion of the 12 th order parameters with the highest contribution,the F-MFCC characteristic parameters were obtained.After the fusion with the short-time energy,gene frequency,formant and other characteristic parameters of the speech signal,the characteristic parameters containing comprehensive information were obtained for the emotion recognition experiment.On the basis of improved feature parameters,a speech emotion recognition algorithm based on deep learning was proposed,and a speech emotion recognition model based on convolutional neural network(CNN),bidirectional long and short time memory network(Bi-LSTM)and multi-attention mechanism was built.The advanced feature vector parameters of each frame of speech signal are extracted by CNN and input into Bi-LSTM + Multi-Attention,which divides the weight of multi-dimensional different subspace learning speech feature parameters and inputs them into the Bi-LSTM neural network for two-way analysis and speech emotion recognition.In order to verify the actual effect of the optimized MEL cepstrum coefficient and deep learning neural network in the application of speech emotion recognition,tests were conducted using IEMOCAP and CASIA sentiment datasets.The experimental results show that the improved MEL cepstrum characteristic parameters and deep learning model can effectively improve the speech emotion recognition ability.
Keywords/Search Tags:Speech emotion recognition, Fishers ratio criterion, Deep learning, Convolutional neural network, Multi-attention mechanism
PDF Full Text Request
Related items