Font Size: a A A

Speech Emotion Recognition Based On Neural Network And Attention Mechanism

Posted on:2020-10-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y LuFull Text:PDF
GTID:2428330572472361Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
With the continuous development of the computer industry,artificial intelligence has entered people's lives,and human-computer interaction through speech has gradually become the main way of human-computer interaction.Speech emotion recognition enables machines to perceive human emotions and understand human emotions.Speech emotion recognition has broad application prospects in mental health status monlitoring,education assistance,personalized content recommendation,and customer service quality monitoring.However,the current recognition rate of speech emotion recognition system is not enough for large-scale commercial use.Improving the accuracy of emotion recognition is a difficult problem to be solved.The general system framework of speech emotion recognition is mainly divided into two parts:speech emotion feature extraction and emotion classification.This paper proposes three innovation points for the classifier model to optimize.1.For the traditional LSTM model,the uniform encoding of the hidden variable length vectors into fixed-length vectors is easy to cause information loss,and each frame of speech is regarded as having the same importance and is inconsistent with the actual situation.We propose a self-attention mechanism model based on LSTM,which expresses the emotion hidden vector as the weighted average of the frame-level hidden vector of emotion.The weight of the frame-level hidden vector is automatically learned by the attention mechanism,so that the model can extract more features with emotional representation to distinguish different speech emotions and improve the accuracy of emotion recognition.2.For Insufficient representation of the characteristics of a single subspace of the LSTM model.On the basis of self-attention,we also propose a multi-head attention mechanism model to learn the feature representation of different subspace locations,so that the model can capture more comprehensive emotional features in multiple subspaces,thus improving the accuracy of emotion recognition.3.Aiming at the high cost of emotional data collection,and the deep learning model requires a lot of data.We propose a system that can improve the system performance of emotion recognition by increasing the noise and reverberation in different proportions to expand the limited data set quickly and at low cost.In order to verify the performance of the algorithm,we compare the three proposed methods with the LSTM benchmark model on the database IEMOCAP and EMODB.The experimental results show that the three proposed methods have better performance.The applicable scenarios and usage boundary conditions of some methods are given.Finally,the combination of the three methods can obtain the maximum performance improvement.Compared with the LSTM baseline model,the attention model proposed in this paper reduces the emotional recognition error rate by 22.78%on the database EMODB,and the emotion recognition error rate decreases by 16.09%on IEMOCAP.The accuracy rate has improved significantly.
Keywords/Search Tags:Speech emotion recognition, LSTM, Self-Attention, Multi-head Attention, Data augment
PDF Full Text Request
Related items