Font Size: a A A

Speech Emotion Recognition Based On Deep Learning

Posted on:2022-12-21Degree:MasterType:Thesis
Country:ChinaCandidate:J Y WangFull Text:PDF
GTID:2518306614458814Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
Over the years,with the high-speed growth of deep learning technology,a lot of deep learning models have been used to speech emotion recognition technology.However,these deep learning models have problems such as insufficient extraction of emotional features,high model complexity,and low attention to time-series features.In response to the above problems,this article will build a Bidirectional Gated Deeply Separable Convolutional Neural Network Model Based on Multi-Head Attention Mechanism,apply the network model to the profundity characteristic extraction and recognition of speech signals,and carry out penetrate deeply research and exhaustive analyze of the pattern.In order to increase the high-frequency component of the speech signal and reduce the interference of the silent segment on the original speech signal,it is necessary to preemphasize the speech signal,perform pre-emphasis,frame windowing,and endpoint detection;perform the pre-processing on the pre-processed voice signal at the Mel frequency Cepstrum analysis,to obtain Mel Frequency Cepstrum Coefficient(MFCC),and use it to characterize the artificial emotion characteristics of the current speech signal.The above processing process prepares data for the subsequent deep learning model to automatically extract deep emotional characteristics,availably resolving the question of deficiency emotional characteristics extraction.A verve net pattern in view of Bi-directional Gated Recurrent Unit Depth Separable Convolution(DSC-BiGRU)is builded.In order to ensure the timing characteristics of the speech signal,a two-way gated loop unit is used to extract the timing information in the feature;a deep separable convolution module is applied to decrease module parameters and settle model complexity.The recognition accuracy of this pattern on the EMO-DB data set is 87.64%;on the CASIA data set,the recognition accuracy of this model is higher than that of the Bi-directional Gated Recurrent Unit Convolutional Neural Network(CNN-BiGRU).Increased by 1.19%,and pattern complexity reduced by close to 77%.available settle the issue of long model complexity.Constructed a Multi-DSC-BiGRU neural network model,which is more selfattention mechanism The guided bidirectional gated depth separable convolutional neural network model(Bi-directional Gated Recurrent Unit Depth Separable Convolution based on Self-attention,Self-DSC-BiGRU)can pay more attention to its own timing characteristics and improve the timing characteristics.The performance of key information.The recognition accuracy of the model on the EMO-DB dataset is 80.73%;the recognition accuracy of the Multi-DSC-BiGRU model on the CASIA dataset is 2.07%higher than that of Self-DSC-BiGRU,and 4.45 higher than that of the DSC-BiGRU model %.Effectively solve the problem of the model's low attention to timing features,and improve the recognition performance of the model.Establish an online voice emotion recognition platform.The Multi-DSC-BiGRU network model is applied to the platform.The platform adopts the Browser/Server architecture,and come true the measurement of the website function modules through the leading-end page exhibition of the explorer.The measurement consequence verify the superiority and voice emotion of the model proposed in this article.Identify the usefulness of the website.
Keywords/Search Tags:speech emotion recognition, mel frequency cepstrum coefficient, depth separable convolution, bi-directional gated recurrent unit, attention mechanism
PDF Full Text Request
Related items