Font Size: a A A

Research On Speech Dialogue Emotion Recognition Based On Deep Learning

Posted on:2024-05-31Degree:MasterType:Thesis
Country:ChinaCandidate:Z H ZhangFull Text:PDF
GTID:2568307094979389Subject:Master of Electronic Information (Professional Degree)
Abstract/Summary:PDF Full Text Request
Speech emotion recognition in dialogue has gradually been widely used in different fields such as emotional computing,human-computer interaction,and intelligent perception.It provides an indispensable support for the development of human-computer intelligence.By analyzing the speech in the dialogue,accurately identifying the emotional state expressed in each sentence in the dialogue,and making appropriate feedback according to different emotional states,the computer can intelligently assist humans to complete the work effectively,which is very important for current industries.The process of informatization and intelligence is of great significance.Although the current dialogue emotion recognition methods have achieved relatively good results,most of the existing methods are text-based methods,and there are few researches on speech-based dialogue emotion recognition.In addition,the existing methods have the following defects: First,the model has limited ability in feature extraction and cannot fully represent the emotional information of speech.Secondly,most of the existing methods ignore the potential relevance of the context of dialogue emotion and the influence of the current speaker’s information.In the dialogue emotion recognition model,extracting context and speaker information is very critical to the recognition results.In order to solve the above problems,this paper has done the following work:Aiming at how to extract key emotional information from lengthy speech,this paper proposes a CNN-Bi LSTM network based on multiple attention mechanisms to better extract emotional features in speech.In this network,a spatial attention mechanism and a channel attention mechanism are first added to CNN to cooperate with each other to locate the spatial position of key features and the importance of modeling channel features,so as to focus on key emotional information.Finally,the Bi LSTM network is used combined with the temporal attention mechanism to extract the relevance of time series features.The WA of the model on the EMODB,IEMOCAP and EESDB speech datasets reached 87.9%,76.5%,75.2%,respectively,and the UA reached 87.6%,73.6%,70.2%,respectively.Aiming at the two important characteristics of the relevance between dialogue contexts and speakers,this paper proposes a dialogue network model based on context and speakers.The model uses Bi GRU to extract contextual features between sentences and send them to the attention mechanism to extract key features.The result is divided into two parts: one part is sent to Bi GRU for context modeling between sentences and sentences,the other part is sent to Bi GRU for speaker modeling,and the two results are fused to finally obtain emotion classification result.In addition,due to the common problem of sample imbalance in the current speech dialogue dataset,this paper uses the Focal loss loss function to reduce the impact of sample imbalance.In this paper,the predictive performance of the model is measured in two public datasets EEIDB and IEMOCAP.The F1 of the model in this paper reached 52.57% and 49.33%,respectively,and the WA reached 60.43% and 50.88%,respectively.The experimental shows that the method in this paper is effective for speech dialogue emotion recognition and has certain research value.Figure[42] table[11] reference[91]...
Keywords/Search Tags:Speech dialogue emotion recognition, Attention mechanism, Human-computer interaction, CNN-BiLSTM, BiGRU
PDF Full Text Request
Related items