Font Size: a A A

Speaker Emotional State Recognition Based On Speech And Text Fusion

Posted on:2021-02-12Degree:MasterType:Thesis
Country:ChinaCandidate:D J WangFull Text:PDF
GTID:2518306314483014Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Emotion recognition is a challenging task because the way people express their emotions is subtle and complex.In recent years,humans have made great progress in emotion recognition,but we still cannot interact naturally with machines.For many human-computer interaction applications,building an emotional system that understands people is essential.In order to perform emotion recognition on humans,researchers usually practice extracting emotion-related features from a single signal,such as audio signals,and then use the features to train a classifier.But single-modal emotion recognition often has the characteristics of low recognition rate and poor robustness.Therefore,improving the accuracy of emotion recognition has become an urgent problem.This thesis mainly integrates the feature information of the two modalities of speech and text,and on this basis,conducts dual-modal emotion recognition research,improves and explores the emotional feature extraction and classifier model for the dual-modal emotion recognition model.The main research contents of this paper are:1.The research is applicable to feature extraction and emotion recognition models based on deep learning in different modalities.For audio modality,an audio loop encoder model is proposed.The open source-tool pyAudioAnalysis is used to extract the basic features in the speech signal.The Short-Time Fourier Transform is used to extract the frequency domain features in the spectrogram.The basic features and the frequency domain features are combined.The author uses the direct fusion method to obtain the final speech features,uses the fusion of the convolutional neural network and the Bidirectional long short-term memory neural network to learn the features,and performs emotion recognition.Through a comparative experiment,the frequency domain features in the spectrogram are verified for the correct rate of emotion recognition It has indeed improved;for text modalities,a text loop encoder model is proposed,which mainly uses the natural language toolkit NLTK to perform word segmentation processing on the text;and combines the Bidirectional long short-term memory neural network to extract the text emotional features,makes the model reasonably use the contextual semantics and word order information of the text;for the mixed features of speech and text,a dual-modal cyclic encoder model is proposed,using direct cascading feature fusion to extract the speech and text of the two models above Features are merged and then pass through the fully connected layer To classify emotions into four categories:happy,angry,sad,and neutral.2.Study the fusion method of bimodal features.Because of the shortcomings of the direct cascade method for dual-modal feature fusion to improve the model,a dual-modal emotion recognition model with a fusion attention mechanism is proposed.This mechanism detects the correlation between the single-modal feature vector and the emotional signal and based on Relevance assigns weights.Focus the system on emotion-related speech frames and text vectors to make the two fusion features more effective,thereby improving recognition efficiency.At the same time,in order to verify the effect of the information on the intermediate nodes of the bidirectional long short-term memory neural network layer on the model performance,the author added a model structure of fusion statistics pooling.The experimental results show that the information of the intermediate nodes of the bidirectional long short-term memory neural network layer is indeed certain.Performance improvement,but compared to the attention mechanism,its performance improvement is not very obvious.
Keywords/Search Tags:Emotion Recognition, Convolutional Neural Network, Bidirectional Long Short-term Memory Neural Network, Attention Mechanism, Bimodal Fusion
PDF Full Text Request
Related items