Font Size: a A A

Research On Multimodal Emotion Recognition Based On Natural Language Characteristics

Posted on:2022-03-13Degree:MasterType:Thesis
Country:ChinaCandidate:J YangFull Text:PDF
GTID:2518306515464294Subject:Internet of Things works
Abstract/Summary:PDF Full Text Request
With the rapid development of short video platforms such as Tik Tok and Kuaishou,more and more creators record their lives and express their views through short videos.These network contents are composed of different forms of information such as text,voice and video images,which is called multimodal information.Multimodal information contains a large amount of Sentiment and Emotion.Accurate analysis of the emotional trend of these multimodal contents is of vital importance for maximizing the value of Internet information.In the context of the current development of science and technology,intelligent machines have entered thousands of households.Only when intelligent machines can quickly and accurately judge people's emotional state can they further analyze and understand human emotions,make more intelligent responses,and realize harmonious and friendly human-computer interaction.For example,in intelligent medical treatment,the intelligent recognition of patients' emotional state is convenient to calm patients' emotions,assist doctors to improve treatment plan,and also help to improve the quality of medical service.In the intelligent driving system,the driver's emotional state can be monitored in real time to detect whether there is drunk driving,fatigue driving and other abnormal conditions,and the emergency reminder can be given quickly according to the results,so as to effectively avoid the occurrence of traffic accidents.On the one hand,in the era of artificial intelligence,human-computer interaction is more frequent,which leads to higher requirements for user experience by the public.People hope that intelligent machines have the ability to observe,understand and generate emotional features similar to human beings.On the other hand,artificial intelligence,as an important engine to drive economic growth in the future,is in fierce competition among countries around the world.Based on the above background,based on the text,voice and image three modal data as the research object,give full consideration to the context information,hierarchical characteristic information capture single-mode state independence and multimodal interaction between information and research based on the depth study of the construction of the multimodal sentiment analysis model,the main work and contributions are as follows:1.A deep learning model AT-DPCNN(Divide-pooling Convolution Neural Network with Attention Mechanism)for text emotion analysis based on attention mechanism is established.The attention matrix is used to focus on the part that that has a greater impact on the emotional direction.The extracted attention feature matrix is calculated with the original word vector to form the attention input matrix,and then the features are secondary extracted through CNN(Convolutional Neural Network).In order to make better extract of features of complex sentence patterns like transitions,the “pooling” operation is split in the pooling layer.The model has been tested on several different data sets,and the experimental results show that the model not only has a high detection accuracy,but also the generalization of the model has been further improved,especially when dealing with complex sentence patterns such as transitions,the classification accuracy and F1 value have been significantly improved.2.A multi-modal sentiment analysis method based on contextual temporal information is established,which extends text as the core mode to other modes,and selectively learns cross-modal features by using an improved attention mechanism and a gating mechanism to capture long-term context.From different perspectives of different modes,we explore the potential connections between different modes,so that the information between different modes can complement each other and show the emotional attributes of information to the greatest extent.Among them,the complex features of audio information are less considered in model experiments,which enhances the role of text information in multimodal sentiment analysis.3.A pre-trained multi-modal sentiment analysis model based on Transformer is established.First,an independent pre-trained model is used to process the original input,and then the resulting text,audio and visual vectors are input into the neural network.This process greatly reduces the overall time of model training.Considering the high-dimensional characteristics of pre-training features,we introduce the text transform mechanism into the multimodal task,and extract features from different neural networks by a special attention fusion mechanism.We benchmark and evaluate the model on different data sets,analyze the overall sentiment detection rate to verify the effectiveness of the model,and set specific sentiment categories,cross modes and other experiments to verify the details of the model.The final results show that the model is robust and has good generalization.
Keywords/Search Tags:Artificial intelligence, Deep learning, Neural network, Multimodal sentiment analysis, Attentional mechanism
PDF Full Text Request
Related items