Font Size: a A A

Sentiment Analysis Based On Multimodal Feature Fusion

Posted on:2024-06-29Degree:MasterType:Thesis
Country:ChinaCandidate:H F WangFull Text:PDF
GTID:2568307136492744Subject:Electronic information
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of human-computer interaction,multimodal emotion analysis has attracted more and more attention from researchers.Compared with early research on sentiment analysis of single modal data,multimodal sentiment analysis introduces information from multiple modalities such as text,video,and audio.This method not only overcomes the problems of low accuracy and poor robustness in single modal sentiment analysis,but also utilizes the complementarity between different modalities to improve the representation ability of emotional features,thereby improving the accuracy and reliability of sentiment analysis.Text,speech,and facial expressions are the most common and common ways of expressing emotions.Therefore,this article focuses on emotional feature extraction and multimodal feature fusion for sentiment analysis based on text,speech,and facial expressions.The main work is as follows:(1)In order to fully utilize the emotional details contained in word granular short time sequences in discourse,this paper proposes a multimodal emotion analysis method based on short time feature fusion,and constructs a neural network model based on bidirectional gated loop units and dual attention mechanism for multimodal short time emotion classification.Firstly,each discourse in the video is segmented at word granularity to generate corresponding text short time sequences,speech short time sequences,and expression short time sequences.A pre trained model is used to extract text,speech,and expression short time emotional features from the text,speech,and expression short time sequences;Secondly,a multi head attention mechanism is used to obtain the global and local correlation information in the single mode short-term emotional feature sequence,and a bidirectional gated loop unit is used to obtain the temporal relationships in the single mode short-term emotional feature sequence;Then,the mutual attention mechanism is used to model the interrelationships between different modalities,obtain multimodal short-term fusion features,and input sentiment classifiers for short-term sentiment classification.The experimental results on MOSI and MOSEI databases show that the recognition accuracy of this model is 34.71%and 44.97%,respectively,and it can effectively extract emotional details in short-term sequences.(2)In order to explore contextual emotion information in long-term sequences of text,speech,and expressions,this paper proposes a multimodal emotion analysis method based on long-term feature fusion,and constructs a neural network model based on long-term feature and low rank multimodal fusion for multimodal long-term emotion classification.Firstly,the video is segmented into discourse units,generating corresponding text long term sequences,speech long term sequences,and facial expression long term sequences.A sub network for extracting text,speech,and facial expression long term features is constructed to extract text,speech,and facial expression long term emotional features from the text,speech,and facial expression long term sequences;Then,a low rank multimodal fusion module is used to fuse the long-term emotional features of the three modalities,obtaining the multimodal long-term fusion features.The emotion classifier is input for long-term emotional classification.Experiments on MOSI and MOSEI databases have shown that the recognition accuracy of this model has been improved by 0.91 percentage points and 0.6percentage points respectively compared to neural network models based on bidirectional gated loop units and dual attention mechanisms,which can effectively extract contextual information from long-term sequences.(3)There are complementarities and differences between long-term and short-term emotional features.In order to effectively utilize these complementarities and differences to improve the performance of multimodal emotional analysis,a multimodal emotional analysis method based on the fusion of long-term and short-term features and decision level is proposed.Using the confusion matrix of long-term and short-term emotion classification as prior knowledge,the long-term and short-term emotion classification results of the tested video are fused at the decision level to obtain the final emotion classification results.Experiments were conducted on MOSI and MOSEI databases,and the proposed method effectively fused long-term and short-term emotional information.The recognition accuracy was improved by 1.7 percentage points and 2.18 percentage points compared to the benchmark network Deep-HOSeq,respectively.
Keywords/Search Tags:Multimodal sentiment analysis, Attention mechanism, Feature extraction, Feature fusion, Decision-level fusion
PDF Full Text Request
Related items