Research On Multimodal Sentiment Analysis Based On Cross-modal Fusion

Posted on:2024-07-05

Degree:Master

Type:Thesis

Country:China

Candidate:J J Sun

Full Text:PDF

GTID:2568307115497694

Subject:Electronic Information (Computer Technology) (Professional Degree)

Abstract/Summary:

PDF Full Text Request

In recent years,more and more users begin to carry their emotions through multi-modal media such as text and image or short video.However,the traditional sentiment analysis model is often designed for the single modal media such as text or picture,so it is difficult to effectively use the multi-modal information,resulting in the accuracy of sentiment classification of multi-modal information is not high.Through the input of information of different modes,the deep learning technology is used to automatically excavate the emotional states of different modes,effectively combining the information of multiple modes,and thus improving the ability of sentiment analysis.Based on the innovation of intra-modal emotion feature modeling and cross-modal feature fusion,this paper studies the multi-modal sentiment analysis of text-image and video.The main research contents are as follows:(1)Aiming at the problem that the existing models of image emotion analysis only consider the relationship between high-level features and text features,but ignore the lower-level features of images,a new model of text and image sentiment analysis based on multi-layer cross-modal attention fusion is proposed.Firstly,the VGG network was connected with multi-layer convolution to obtain image features of different levels,and BERT word embedding and Bi-GRU were used to obtain text emotion features.In order to enable the model to focus on the important information related to the text content,the extracted multi-layer image features are fused with text features to obtain multiple groups of single-layer text-image attention fusion features,which are assigned weights through the attention network.Finally,the obtained multi-layer text-image attention fusion features are input into the full-link layer to obtain the classification results.The experimental results show that compared with the baseline model,the accuracy and F1 value of MAFSA model are improved,which effectively improves the performance of graphic emotion classification.(2)In view of the problems that single-mode feature heterogeneity is difficult to retain in feature extraction of video sentiment analysis model and feature redundancy in cross-mode fusion,a video sentiment analysis model(MTSA)based on multi-task learning and Cascade Transformer is proposed.The MTSA model uses LSTM and the multi-task learning framework to extract the semantic information of single-mode context.By accumulating the auxiliary modal task loss,the noise is removed and the modal feature heterogeneity is preserved.Multi-task gating mechanism is used to adjust cross-modal feature fusion,and text,audio and visual modal features are fused in a cascade Transformer structure to improve the fusion depth and avoid fusion feature redundancy.Gradnorm and subtask weight attenuation are used to optimize multitasking losses and balance multitasking training Experimental results show that MTSA can effectively improve the performance of video sentiment analysis.

Keywords/Search Tags:

Cross-modal fusion, Multi-modal sentiment analysis, Attention mechanism, CNN, RNN

PDF Full Text Request

Related items

1	The Research On Sentiment Analysis Method For Multi-modal Data Of Social Media
2	Multi-modal Data Sentiment Analysis System Based On Combined LSTM
3	Research On Image-Text Cross-Modal Matching Based On Attention Mechanism
4	An Optimized Approach To Cross-Modal Retrieval Based On Multi-level Attention Mechanism
5	Image And Text Fusion Sentiment Analysis For Social Media
6	Research On Sentiment Analysis Based On Multi-modal Information Fusion
7	Multimodal Sentiment Analysis Based On Multichannel Convolutional Neural Network
8	Research On Visual Perception Technology Based On Multi-modal Fusion
9	Research On Sentiment Analysis Method Based On Multi-modal Feature Fusion
10	Recommendation Algorithm Based On Multi-Modal Fusion