Font Size: a A A

Speech Emotion Recognition Modeling Research Based On Deep Learning

Posted on:2020-01-26Degree:MasterType:Thesis
Country:ChinaCandidate:W HeFull Text:PDF
GTID:2428330575956408Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the development of computer technology and the popularization of artificial intelligence,the speech emotion recognition has received extensive Attention from the academic and industrial circles.At present,most of the emotion recognition tasks use the method of manually extracting various acoustic features and reducing physical dimensions,constructing feature engineering,and improving the recognition results.This paper aims to explore the expression of emotion information in speech,understand the change and invariance of emotion information in speech,extract the essential characteristics of emotion from speech,and build the most suitable network structure to represent emotion information.Based on the above research emphases,this paper includes the following parts:1.Study the emotion recognition network based on traditional speech featuresIn a large number of acoustic characteristics,statistical analysis of the existing data to filter out the acoustic characteristics and statistical characteristics,build an effective and complete emotional characteristics of the project.From the physical sense,we screen the reasonable characteristics of emotion expression and verify their effectiveness.From the perspective of mathematical statistics,the chi-square test is used to select features,remove redundant information of feature sets,improve network training efficiency,and construct complete feature engineering.2.Study a deep learning emotion recognition network based on speech spectrogramThe speech spectrum contains almost all the speech features.The two-dimensional spectrum structure can not only reflect the characteristics of harmonic and other excitation sources,but also analyze the channel characteristics such as cepstrum and formant.Deep neural networks introduce nonlinear information and have the advantage of self-learning input data characteristics.The deep learning emotion recognition network based on speech spectrogram was built,the ResNet network with local perception and skip connection was selected,and the improvement was made based on convolution kernel.On this basis,the ResNet-LSTM network was built to conduct time-series modeling of the high-level emotional characteristics learned from the ResNet network.3.Attention mechanism is introduced to study the feature fusion of low-level descriptors and high-level semantic informationThe set of traditional speech features that can represent emotional information is integrated with the high-level semantic information of speech signals learned by ResNet-LSTM network.The fused features are classified and output by DNN network to increase the explanatory power and artificial assistance of deep learning.In addition,Attention mechanism is introduced to explore the key frame information in speech.The learned Attention is added as a weight coefficient to the artificially extracted low-level descriptor features and applied to the feature fusion experiment.
Keywords/Search Tags:emotion recognition, emotion feature, set deep learning, Attention mechanism
PDF Full Text Request
Related items