Font Size: a A A

Research On Joint Visual-Textual Sentiment Analysis Based On Attention Mechanism

Posted on:2020-07-10Degree:MasterType:Thesis
Country:ChinaCandidate:X L ZhuFull Text:PDF
GTID:2428330623459871Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology and the increasing popularity of the mobile Internet,people's lifestyles and communication ways have undergone earth-shaking changes.More and more people use social network accounts to express their opinions in the form of texts and images,making the Internet be a resource pool of opinions and emotions covering a wide range of topics.Sentiment analysis for social media data is crucial for understanding the individual behavior of users.Its research can provide effective reference for political elections,stock market analysis,film box office prediction,mental health care and online word-of-mouth marketing,thus having important practical application value.Compared with single-modal data such as texts or images,joint consideration of these multi-modal data can provide multi-dimensional information supplement for the user's sentiment analysis,and can more accurately reflect the user's emotional tendency.Traditional sentiment analysis work mainly focuses on single-modal data such as images or texts.In recent years,researchers have begun to consider image-text joint sentiment analysis.However,such work only considers the simple techniques of features fusion or decisions fusion,which is difficult to achieve satisfactory sentimental classification in complex and variable multimodal social media data.It is necessary to consider the semantic alignment relationships existing in the image-text pairs and their inconsistent contribution to the joint sentiment analysis,which plays an important role in improving the performance of users' sentiment analysis.In view of the shortcomings of existing research and inspired by the characteristics of the user-published image-text pairs in the social network,this paper designs two kinds of models to improve the performance of joint visual-textual sentiment analysis based on deep neural network technology and attention mechanism.For a given image-text pair,the specific work of the paper is described as follows:1)Considering the difference in the contribution of the image and the text to joint sentiment analysis,a cross-modal attention mechanism based on sentimental context is proposed,and cooperates with the bidirectional recurrent neural network to assign different weights to the image and text features,measuring their contribution to pair's joint sentiment analysis.Then,the joint feature is obtained by weighted summation of the image and text features,training a classifier to achieve efficient sentiment classification.2)Considering that there are multi-grained alignment relationships between the image regions and text at different levels such as words and phrases,the bilinear attention mechanism is used to measure such kinds of correlations,and an attention transfer learning mechanism is firstly proposed to boost the accuracy of measurement.Then,multi-modal convolutional neural networks are designed to capture the multi-grained semantic matching relations and sentimental interaction features between two modalities,thus obtaining global joint representations of the image-text pair at different levels.Finally,by combining all the levels' joint representations,a classifier is trained and realizes more accurate analysis of the joint visual-textual sentiment.In order to validate the performance of the proposed models,two large-scale image-text datasets are constructed from Flickr and GettyImages social networking sites respectively.Then we train the proposed models on these data,and compare the performance of them with existing models on different metrics.The experimental results show that in the sentimentally inconsistent image-text pairs,the classification performance advantage of the joint sentiment analysis model based on the cross-modal attention mechanism is more significant,while the one based on the multi-grained image-text attention mechanism achieves the state of the art on the joint visual-textual sentiment analysis task,and also prove that the phrases is more suitable for interacting with the local regions of the image than the words.
Keywords/Search Tags:multimodal learning, attention mechanism, sentiment analysis, neural network, social media
PDF Full Text Request
Related items