Font Size: a A A

Research On Sentiment Classification For Microblogging Based On Multimodal Data

Posted on:2018-04-23Degree:MasterType:Thesis
Country:ChinaCandidate:J X TanFull Text:PDF
GTID:2428330512998178Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Microblogging is one of the main ways by which people can express and know about the opinions rapidly.It contains abundant multimodal complex data objects that reflect people's feelings,such as texts,images.It is of great significance to take sentiment classification based on microblogging data for the application in election forecasting,social recommendation,etc.Nowadays,more and more people take posts with images and short texts.To take sentiment classification for microblogging in that situ-ation much better is a new challenge for researchers.With the microblogging texts and corresponding images that crawled by ourselves and another public dataset,to solve the problem of sentiment classification of microblogging including only images with short texts,the following are our main contributions:First,for the purpose of sentiment classification for images with short texts,we propose a new algorithm with multiple kernel learning.This method maps the features of texts and images into kernel space,and fuses texts and images in the feature-level more effectively on the basis of the consistency of texts and images on sentiment in the same microblogging,redundancy and dissimilarity in the feature spaces.The proposed method can fix the problem about sparse features caused by short texts,and searches shared subspace for texts and images in the kernel space so that the entire classifier performance on images and texts can be boosted.We conduct contrast experiments between our proposed method and other state-of-arts sentiment classification approaches,based on Sina Weibo dataset including texts and images crawled and labeled by ourselves and another public dataset.It shows that our proposed method can fuse texts and images more effectively and achieve better performance on most evaluation criterions.Second,comment is one of the most important data sources.In order to utilize comment data to achieve much better classification performance on the microblogging contents(text and image),we propose a joint topic model of content and comment.Specifically,we use BFGM-LDA(Bayes Finite Gaussian-multinomial LDA)to joint model texts and images,and use topic model LDA to model comments.Moreover,we introduce a latent variate to describe the correlation between sentiment of comments and that of contents,which is later used to fuse both of them.Due to generating comments with joint topic distribution of texts and images in a probabilistic manner,our proposed method can fix problem caused by short text.We conduct experiments based on Sina Weibo dataset including texts,images and comments crawled and labeled by ourselves and another public dataset.Experimental results verify that our method can fuse comment data and contents more effectively,and comment data can help improve sentiment classification performance of microblogging with images and short texts.Third,To validate our proposed method adequately,we write a programming project named DistributedWeiboSpider for crawling Sina Weibo microblogging data based on Scrapy,a open source crawling framework in python,and Redis,a memory database.Through that application,We crawl a plenty of microblogging data including texts,images and comments.These data are filtered out and labeled in a reasonable way.At last,we achieve a dataset with two labels,positive and negative.The number of positive microblogging and negative microblogging is 6000 and 4008 respectively.We conduct experiments on this dataset and prove that our proposed methods can achieve better performance not only for binary classification problem but also for multi-class problem.
Keywords/Search Tags:Microblogging, Sentiment Classification, Multimodal Fusion, Multiple Kernel Learning, Graphical Models
PDF Full Text Request
Related items