Font Size: a A A

Research And Application Of Speech Emotion Recognition Algorithm Based On Deep Learning

Posted on:2021-05-17Degree:MasterType:Thesis
Country:ChinaCandidate:S F WuFull Text:PDF
GTID:2428330614463596Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
In recent years,human-computer interaction systems are gradually coming into our lives.As one of the key technologies in human-computer interaction systems,speech emotion recognition technology has attracted widespread attention from researchers at home and abroad,which can accurately identify emotions and help machines better understand the user's intentions,thereby improving the quality of human-computer interaction.With the huge breakthroughs made by deep learning in the areas of image and speech recognition,scholars have begun to use deep learning in speech emotion recognition,and have proposed many speech emotion recognition algorithms based on it.This article makes in-depth research on these algorithms and finds that these algorithms have problems such as too simple feature extraction,low utilization of hand-crafted features,high model complexity and low accuracy in identifying specific emotions.In order to solve these problems,this article improves the variable-length speech emotion recognition algorithm from the perspective of algorithm and model structure.The improved algorithm not only improves the accuracy of speech emotion recognition,but also effectively reduces the complexity of the emotion recognition system.The main research work of this article is as follows:We study speech emotion recognition algorithms.We focus on the variable-length speech emotion recognition algorithm based on deep neural networks,and introduce the model structure and key technologies of algorithm.We evaluate the model on the Interactive Emotional Motion Capture(IEMOCAP)dataset,which is one of the standard datasets for speech emotion recognition.We compare the performance of fixed-length speech emotion recognition algorithm with variable-length speech emotion recognition algorithm,and prove the superiority of variable-length speech emotion recognition algorithm.Finally,the problems of the algorithm are analyzed.For the problems of variable-length speech emotion recognition algorithms,such as the low utilization rate of hand-crafted features and the too simple feature extraction,this article improves the feature extraction algorithm and model structure,and proposes a variable-length speech emotion recognition algorithm based on weighted feature fusion method.We evaluated the model on the IEMOCAP dataset and achieved more than 5% increase in weighted accuracy(WA)and unweighted accuracy(UA),comparing with the original algorithm.For the problems of high complexity and low accuracy in identifying specific emotions,this article uses a mini convolution algorithm and a multi-task learning algorithm to solve it.A novel algorithm is proposed in this article.Using multi-task learning and mini convolution algorithms,we not only improve the accuracy of speech emotion recognition,but also effectively reduce the complexity of the emotion recognition system.We evaluate the performance and complexity of the model on the IEMOCAP dataset.Compared with the existing state-of-the-art methods,the method proposed improves the recognition accuracy by more than 8% and reduces the complexity of the model by 70%.
Keywords/Search Tags:Speech Emotion Recognition, Bidirectional Long Short-Term Memory, Weighted Feature Fusion, Multi-task Learning, Mini Convolutional Network, Mini Deep neural network
PDF Full Text Request
Related items