Font Size: a A A

Research On Algorithms For Multimodal Sentiment Analysis Based On The Attention Mechanism

Posted on:2021-05-13Degree:MasterType:Thesis
Country:ChinaCandidate:J R ZhouFull Text:PDF
GTID:2428330602498992Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Sentiment analysis has been an important subject in the science of data mining.This subject was brought up as sentiment analysis for text in natural language processing,which means recognizing the sentiment expressed by textual content.With the development of social media,electronic commerce and other applications on the internet,the target of sentiment analysis is not only text any more.Short videos and the combination of text and image become the target of sentiment analysis under many circumstances.Most e-commercial websites allow users to upload short videos,images and text when they write down comments.Tasks of analyzing sentiment with data from more than one modality are referred to as multimodal sentiment analysis.The main current issue in multimodal sentiment analysis is that how to fuse data of different modalities for the purpose of information augment,which will eventually increase the accuracy of sentiment recognition.This dissertation is written under the background that a huge amount of multimodal data is generated on the internet every day.The research target is multimodal online reviews from most influential review websites.We choose two main data forms of online reviews,text of comments attached with some images and videos where customers directly talk about their opinions.We propose two algorithm models to deal with these two data types separately.First of all,we propose a self-attention based neural network for mining sentimental information in user generated videos.Then we propose a hierarchical attention neural network for analyzing sentiment of documents with attached images.Generally speaking,the main work and contributions are summarized as follows:(1)Multimodal fusion algorithm based on self-attention and neural networks.There are three modalities in user generated videos:text,visual and acoustic.We adopt self-attention layer to encode textual data which contains rich and clear semantic information.Then we consider attention of utterances in a video and attention of modalities for mining utterance interactions and modality interactions.This model is built with multilevel neural networks.Our model is proved effective by conducting experiments on a multimodal dataset that has been used in many related works.(2)Text and image fusion algorithm for multimodal sentiment analysis based on hierarchical attention networks.The most common data type on major online review websites is the combination of a document and a certain number of images or fewer.To analyze sentiment of this combination,we propose a hierarchical attention network that could improve the accuracy of document sentiment analysis with additional visual information.We treat images as attention for important sentences in the document.A sentence is considered important if it is related to the attached images.To keep semantic connections among words from losing in the encoding process,we adopt a self-attention layer for encoding word representation vectors to remain more contextual information in the learning process.Meanwhile we adopt a dense layer to combine visual aspect attention and sentence attention to avoid ignoring some important sentences that are not related to the attached images.The effectiveness of this model is proved by conducting experiments on a real dataset of restaurant reviews.
Keywords/Search Tags:Multimodal data, Online review, Sentiment analysis, Attention mechanism, Neural network
PDF Full Text Request
Related items