Font Size: a A A

Research On Speech Emotion Feature Extraction And Processing Algorithm Based On Deep Learning

Posted on:2021-05-02Degree:MasterType:Thesis
Country:ChinaCandidate:P X JiangFull Text:PDF
GTID:2428330605952054Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Language contains rich emotional information.In the past few decades,the research of speech emotion recognition has made great progress.In recent years,deep learning has achieved great success in various fields.Compared with traditional features,deep learning features have more internal information.However,how to design the relevant algorithm and model structure reasonably still needs to be explored and studied.This thesis research on speech emotion feature extraction and processing algorithm based on deep learning.The main contents are as follows:1.It mainly introduces the research significance and background of speech emotion recognition,domestic and foreign research status and existing problems,and introduces the main work and organizational structure of this thesis in detail.2.It mainly studies the system flow of speech emotion recognition,including emotion description model,speech emotion database,emotion feature extraction and emotion classifier.3.A speech emotion recognition model based on the feature representation of convolutional neural network(CNN)is proposed.Based on the lenet-5 model,the convolution model adds a layer of convolution layer and pooling layer,and changes the two-dimensional convolution core into one-dimensional convolution core.After preprocessing the one-dimensional feature,it is transmitted into the convolution network model to transform the feature representation.Finally,the emotion is realized by using SoftMax classifier Classification.The identification results on the open database verify the validity of the network model,4.Single network model has limited learning effect on features.To improve the learning ability of the model on emotional features,a serial network model based on convolutional neural network and simple recurrent unit(SRU)is proposed.Firstly,segmented three-dimensional spectrum features are used as the input of the model,we use a pre-trained CNN module to learn these features,and a SRU module to fuse the features of these time-dependent segments,and finally use the classifier to classify the emotion..The experimental results on Emo-db and CASIA database show that the model can effectively identify the emotion information contained in the speech.5.Because the serial connection between models may lose important emotional information in the process of feature learning,a parallel network model structure is proposed,which is composed of long-term memory network(LSTM)module and CNN module.First,extract the frame-level features of each segment of speech data and send them to the LSTM module for frame by frame learning.At the same time,the spectrum feature of each segment of speech data is extracted to form 3-D spectrum feature(static,delta,and delta-delta).Learn these features in the CNN module.Then integrate the features extracted in the two modules and carry out batch normalization processing.Finally,use the SoftMax classifier classifies emotion,and the experimental results on Emo-db and CASIA database show the superiority of the proposed method.
Keywords/Search Tags:Speech emotion recognition, Deep learning, Feature extraction
PDF Full Text Request
Related items