Font Size: a A A

Research On Emotion Recognition Method Of Multimodal Fusion Based On Dialogue

Posted on:2021-02-13Degree:MasterType:Thesis
Country:ChinaCandidate:Q L LiFull Text:PDF
GTID:2428330647963103Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
The main task and goal of emotion recognition is to allow machines to perceive human emotions,not only to understand and understand the meaning of human expression,but also to read human emotions through expressions,words,and sounds.Through the perception of human emotions by machines,they can be better customized to serve people.The research on emotion recognition has not only great significance for the future development of artificial intelligence related research but also great commercial value.Sentiment analysis is a basic task of natural language processing.The processing of information processing(text)based on a single modality is now very mature,but multi-modal emotion recognition for text,pictures,and sounds adds more challenges..Interpersonal communication is a multimodal signal.People's emotions are usually expressed through text(text modalities),facial expressions(visual modalities),and changes in tone and tone(acoustic modalities)as carriers.People can easily feel the changes in each other's emotions,and it is still very difficult for machines to accurately recognize emotions.By studying multi-modal emotion recognition,machines can better recognize emotions.In the process of dealing with multi-modal emotion recognition,the most commonly used method is to extract features from a single mode,and use the cascade method to perform feature fusion.This method will cause too much information and the same effect of the default vector.The emotion recognition effect is not ideal.In order to solve this problem,this paper mainly adopts two methods for multimodal emotion recognition.(1)Extract feature vectors from the input features(text,sound,and pictures),and input the extracted feature vectors into the Self-attention network for feature fusion.The fused feature vectors are connected to the fully connected network structure.Use Softmax to classify emotions and get classification results.(2)Process the input video based on the Video-BERT model,input the extracted visual token and text token into the BERT network for feature extraction,and input the extracted features into the Self-attention layer and then access the fully connected layer sort.Experiments on single-modal emotion recognition and multi-modal emotion recognition are conducted in the MELD data set,and the recognition accuracy of other emotion recognition methods and the two methods proposed in this paper are compared.In order to verify the scalability of the method proposed in this paper,it is verified in the CUMMOSEI data set.The experimental results show that the two emotion recognition models proposed in this paper can effectively recognize emotions.The experimental results in the same data set show that the effect of multi-modal emotion recognition is higher than that of single-modal emotion recognition.And the accuracy rate of the feature fusion method based on the self-attention mechanism is higher than other methods.The emotion recognition method based on Video BERT is similar to the current best method.This article also applies the emotion recognition method to the actual conversation-based artificial intelligence customer service system Medium,all have good performance and improved customer satisfaction.
Keywords/Search Tags:Emotion recognition, Neural Networks, Multimodal, video-BERT
PDF Full Text Request
Related items