Font Size: a A A

Research On Speech Emotion Recognition Technology Based On Deep Learning

Posted on:2022-05-31Degree:MasterType:Thesis
Country:ChinaCandidate:S S TangFull Text:PDF
GTID:2518306575464224Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
The purpose of speech emotion recognition technology is to allow computers to recognize emotions from the speaker's speech signal and achieve more natural and humanized human-computer interaction.Speech emotion recognition technology is widely used in the fields of intelligent education,intelligent medical care,and safe driving of automobiles.Therefore,speech emotion recognition technology has certain research value.The main research direction of current speech emotion recognition is feature extraction and recognition model,whose advantages and disadvantages directly affect the system's recognition effect on speech emotion.Research on these two aspects,a speech emotion recognition system is designed by combining deep learning technology in this thesis.First of all,the current research status of speech emotion feature extraction and emotion recognition models at home and abroad are reviewed.Then,the theory of speech emotion recognition and two kinds of commonly used deep neural networks are described.By analyzing the problems of current research methods,the research objectives of this thesis are clarified and the scheme design of the speech emotion recognition system is completed.Secondly,to address the problem that the recognition rate is not high due to the incomplete texture feature information of the traditional speech spectrogram,a Spectrum Diagram based on Time Modulated Signal(SDTMS)feature extraction method is proposed in this thesis.This method mainly consists of an auditory filterbank,a time envelope extraction filterbank and modulation filterbank,and extracts a threedimensional speech spectrogram composed of time,sound frequency and modulation frequency from the speech signal.To address the problem of insufficient utilization of emotional feature information by two-dimensional neural networks,a Three Dimensional Convolutional Recurrent Neural Networks(3DCRNN)model is designed for classification and recognition in this thesis.The features extracted in this thesis are compared with other features for experiments,and the results demonstrate that SDTMS features can effectively improve the recognition rate.Then,To address the problem of gradient disappearance or explosion and network performance degradation when CNN models deepen the number of network layers,a Three Dimensional Attention Convolution Recurrent Neural Network based on Residual Network(Res3DACRNN)model is proposed in this thesis.The method uses the speech spectrogram as input and solves the gradient disappearance or explosion problem by hopping shallow features into the deep network using the residual network,thus improving the recognition rate of speech emotion.Comparing with other methods,the recognition performance of the proposed method in this thesis is better.Finally,a speech emotion recognition system based on deep learning is designed using the proposed SDTMS features and the Res3 DACRNN model in this thesis.In order to verify the recognition rate of the system for Chinese emotions,experimental validation and analysis on CASIA and homemade datasets are conducted in this thesis.The experimental results demonstrate that the system designed in this thesis has good performance and strong generalization ability,and the speech emotion recognition rate is improved.
Keywords/Search Tags:speech emotion recognition, the speech spectrogram, 3DCNN, residual network, attention mechanism
PDF Full Text Request
Related items