Font Size: a A A

Research On Speech Emotion Recognition Method Based On Multi-feature And Multi-modal Fusion

Posted on:2022-09-13Degree:MasterType:Thesis
Country:ChinaCandidate:Q CaoFull Text:PDF
GTID:2518306569997369Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Studies have shown that when people are in emotional states such as anxiety,sadness,and anger for a long time,the concentration of glucose and fatty acids in the human body will rise,which will lead to cardiovascular and cerebrovascular diseases.That is,longterm negative emotions will produce diseases.With the faster and faster pace of life,people have been in a state of anxiety,panic and other negative emotions for a long time,causing some diseases.Obviously,if individuals can clearly recognize their own emotional state and regulate their emotions,then disease can be avoided.However,the performance of existing emotion recognition methods cannot meet the application requirements,so this paper focuses on the research of speech emotion recognition methods based on multi-feature and multi-modal fusion,and improves the performance of speech emotion recognition algorithms from the perspective of multi-feature and multi-modality.In summary,the main work of this article is as follows:An emotion recognition algorithm based on multi-path recurrent neural network is proposed.The network first uses static features and dynamic features as input,then uses a multi-path recurrent network to learn different emotional features to reduce the interference between features,and finally uses the attention mechanism to focus on emotionally prominent speech frames and classify emotions.The experimental results on the IEMOCAP dataset prove the effectiveness of the multi-path recurrent neural network.An emotion recognition algorithm based on gating unit and hierarchical network is proposed.This algorithm is an improvement of the multi-path cyclic neural network method,which contains a total of three layers of networks.The first layer is feature coding to obtain the initial representation of static features and dynamic features.The second layer is feature fusion.The gated fusion unit is used to learn the contribution of static and dynamic features to emotion at each moment,and the two features are integrated according to the contribution value to obtain an intermediate expression of emotion.The third layer is emotion classification,which extracts and classifies emotion-related features by using the attention mechanism.The experimental results on the IEMOCAP data set show that the proposed hierarchical network achieves the current good results in the field of speech emotion recognition,72.5%UA.A multi-modal emotion recognition algorithm based on time and semantic consistency is proposed.The algorithm consists of three parts: feature coding,feature fusion,and sentiment classification.First,encode the voice and text data in the feature encoding part to obtain the primary features.Then in the feature fusion part,through the voice and text alignment module,the time consistency of the two domain data is maintained,and a context-conscious cross-attention is used to fuse emotional features in the same semantic space.Finally,the emotion classification part takes the fusion features as input,and extracts more advanced features for emotion classification.On the IEMOCAP data set,the proposed method based on time and semantic consistency achieves 76.64%UA,which is currently the best result in this field.
Keywords/Search Tags:speech emotion recognition, multi-modal emotion recognition, attention mechanism, gated unit, temporal and semantic consistency
PDF Full Text Request
Related items