Font Size: a A A

Research On Speech Emotion Recognition Model Based On Deep Neutral Network

Posted on:2020-03-15Degree:MasterType:Thesis
Country:ChinaCandidate:J ShiFull Text:PDF
GTID:2428330599952923Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Speech emotion recognition is a hot topic research in the audio field,especially in the field of artificial intelligence interaction.For example,it is widely applied in the fields of smart classroom,intelligent driving and smart healthcare.Speech emotion recognition aims to enable the machine to capture the emotions conveyed in human voice through the machine's perceptual judgment,and then flexibly adjust the interaction scheme and real-time decision-making to achieve a higher degree of human-computer interaction.In this thesis,the spectrogram representation of speech emotion recognition and the speech emotion recognition method based on deep learning will be researched.The robustness of speech emotion feature extraction and the accuracy of emotion recognition results are the focus of this thesis.To solve the problem of cumbersome speech emotion feature extraction process and the redundancy in the extracted features,this thesis adopts the combination of polymorphic spectrogram and deep learning to carry out end-to-end learning,so as to improve the effectiveness of speech emotion feature extraction and enable automated feature extraction.And to solve the problems of high false recognition rate and low discrimination in traditional speech emotion recognition,this thesis proposes a multi-level speech emotion recognition framework.This framework calculates the similarity of high emotion through the hash algorithm,generates the temporal speech emotion feature vector by using the model based on the long short time memory network,and enriches the original samples and optimizes the multi-level speech emotion recognition framework by using the multi-sample rate sampling method,to improve the recognition accuracy.The main contributions of this thesis include:(1)The traditional speech emotion recognition method needs to extract the acoustic parameters,so the extraction process is cumbersome,the features cannot distinguish the emotions well and a lot of them redundant and invalid.To solve the above problems,this thesis proposes a speech emotion recognition model SMel-CNN.This model takes the spectrogram and the Mel spectrogram as the initial input,extracts the time-frequency domain features at the same time,and combines the time-frequency domain features to improve the effectiveness of speech feature extraction and realize the automatic extraction of speech emotion feature.And the validity of the SMel-CNN model proposed in this thesis is proved in my experiment.(2)To solve the problems in traditional speech emotion recognition such as high false recognition rate and high emotional expression similarity,this thesis proposes a multi-level speech emotion recognition framework ML-EM.The framework calculates the similarity of the speech sentiment category and obtains a high similarity emotion set by using the hash algorithm,and then identifies the set by using the time series model SC-LSTM proposed in this thesis.The SC-LSTM model uses the SMel-CNN model to extract the features of the previous spectrogram and uses the LSTM network for later timing modeling.My experiment proves that the ML-EM framework improves the overall discrimination and accuracy of speech emotion recognition.(3)In this thesis,an automatic speech emotion recognition system is designed and implemented.The spectrogram and the Mel spectrogram of speech segments are extracted as the initial layer of feature extraction.The feature extraction and recognition are carried out by SMel-CNN model and SC-LSTM model,and the high-precision recognition results are obtained.
Keywords/Search Tags:speech emotion recognition, convolutional Neutral network, long-short memory Neutral network, deep learning
PDF Full Text Request
Related items