Font Size: a A A

Research On Sentiment Analysis Technology For Multimodal Social Data

Posted on:2022-12-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:S ZhangFull Text:PDF
GTID:1488306758966079Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
The advances on information and communication technologies have allowed social media involved in people's daily lives.The role of users is gradually changing from information consumers to producers.User-generated contents with individual sentiments and opinions are continuously spread in virtual network,and then affect the behaviors and events occurred in real world.Therefore,sentiment analysis and recognition for social data are always regarded as an important basic issue of affective computing and natural language processing.But the development in communication technologies has changed the traditional form of social data.User-generated contents are no longer limited to isolated textual data,but the multimodal data composed of textual,visual and acoustic modalities.The emergence of multimodal social data makes sentiment analysis become a difficult cross-domain problem.It not only requires appropriate processing methods to solve the heterogeneity problem,but also requires expert knowledge from psychology and cognition science to inspire the design of model architecture and information fusion process.Currently,it has witnessed some progress in recent studies about multimodal sentiment analysis,but most methods have neglected the problems caused by complex data patterns in special application scenarios,including:(1)Problem of short text representation in micro-blog scenario.Most studies about sentiment analysis focus only on long texts,while they ignore the problems of feature sparsity and informational deficiency in short text representation.They are vulnerable to learn the efficient feature representation from limited lexicons.(2)Problem of image-text fusion in product review scenario.Product reviews are composed of a textual paragraph and multiple images,where these images cannot express the complete emotions,but play an auxiliary role in enhancing textual sentiment.Current methods about image-text sentiment analysis commonly assume that the importance of text and image is equal,which does not follow the realistic setting of product review scenario.(3)Problem of multimodal sequential representation and fusion in video scenario.A video can be decomposed into three types of sequential modalities,i.e.text,image,and audio.They have sequential characteristics,and there is an interaction relationship between each other.It requires the model to extract the intra-modal sequential features,and capture the cross-modal interactions.(4)Problem of information control in fusion process.Most methods for multimodal sentiment analysis focus only on the aggregation of multi-source heterogeneous information,while neglect the selection and filtering of information from raw input modalities.In the process of multimodal emotion expression,the consistency and specificity information are carried with each modality,which require multimodal fusion to capture two categories of information and filter the task-irrelevant information for learning a compact and efficient fusion representation.Based on the above problems,this dissertation proposed four research contents as follows,in which the corresponding improvements have been proposed from the perspective of feature representation,information fusion,and model design,as the follows:(1)Aiming at the problem of short text representation in micro-blog scenario,a short text sentiment analysis model is proposed based on Adversarial Variational Bayes.Firstly,in the model design,an end-to-end framework is employed to solve the target inconsistency between upstream topic model and downstream task model,and obtain more discriminative compact topic representation.Then,spectrum normalization is introduced to alleviate the oscillation problem in adversarial training process.Lastly,a multi-stage fusion process is proposed to aggregate the information from topic features and pre-trained word presentations,which handles the problem of information deficiency in short text.(2)Aiming at the problem of image-text fusion in product review scenario,a sentiment classification model based on decision diversity is proposed,where the textual information is considered as the principal part,and the visual information is exploited to locate the position of sentiment-bearing lexicons in the sentences for modeling the image-text interactions in the special scenario,and achieving the cross-modal feature-level fusion.Then,based on the idea of ensemble learning,a decision fusion mechanism is proposed to aggregate the decision information from unimodal and fusion representations.Lastly,a decision diversity penalty is designed to improve the decision diversity and the generalization ability.(3)Aiming at the problem of multimodal sequential representation and fusion in video scenario,a sentiment analysis model based on multi-task learning is proposed,in which convolutional neural network,bidirectional gated recurrent neural network,and multi-head self-attention mechanism are hierarchically integrated to unify the dimension and length of unaligned sequences,extract the local and global relationships,and solve the problem of unimodal representation.Then,a cross-modal temporal feature fusion method is proposed to explore the bidirectional interactions between modalities and learn the fusion representation.Lastly,the downstream multi-task model is designed to enhance the capability of discrimination and generalization,in which the unimodal and cross-modal feature representations are shared among all the tasks.(4)Aiming at the problem of information control in fusion process,a multimodal sentiment analysis model based on information decomposition and fusion is proposed.Firstly,the potential probabilistic distributions of unimodal inputs in consistency and specificity subspace are inferred by variational encoder.The consistency and specificity information are explicitly decomposed via similarity and difference constraints.Then,motivated by the information bottleneck principle,the fusion representation is learning with the supervised signal from the task module,in which the mutual information with consistency-specificity representations is maximized to extract the task-related information,and the mutual information with the raw representation is reduced to filter the irrelevant noise.Lastly,to avoid the risk of information loss in the process of information decomposition and fusion,a reconstruction task is deployed as an integrity constraint to the upstream modules,where the original inputs are recovered from the from the information bottleneck representations.
Keywords/Search Tags:Social Network, Sentiment Analysis, Multimodal Learning, Representation Learning, Information Fusion
PDF Full Text Request
Related items