Font Size: a A A

Design And Research Of New Media Sentiment Analysis Method Based On Multimodality

Posted on:2023-10-13Degree:MasterType:Thesis
Country:ChinaCandidate:Q F QiFull Text:PDF
GTID:2558307058467174Subject:Control engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of new online media such as Douyin and Kuaishou,in response to a certain event,more and more netizens express their views and convey their emotions by uploading videos.The way video provides content can be summed up in three types of multimodal data : text in the form of spoken language,vision through the perception of gestures and facial expressions,and voice through intonation and rhythm.Compared with a single modality,the information conveyed by multimodal data has the characteristics of multi-dimensional,multi-category,deep-level,and so on.Therefore,analyzing multimodal data can better help us obtain correct sentiment understanding.However,regarding the problem of data fusion in multimodal sentiment analysis,the previous studies too much adopted the method of directly splicing feature vectors in the time dimension,that is,assigning the same weight to each modal data by default.This method does not pay attention to the strength of each modality in the multimodal data to transmit information at a specific time,and cannot obtain the associated embedding representation of multiple modalities in the time dimension.Moreover,further research found that the word meaning of the text will change dynamically with the visual and sound modalities,which can enhance or weaken the emotional properties of the word meaning of the text itself.To this end,we propose two multimodal data fusion models to address the problems encountered in multimodal sentiment analysis tasks.Aiming at the problem that the above feature vector splicing cannot obtain the associated embedding representation of multiple modalities in the time dimension,model 1is proposed : a multi-modal sequence feature extraction network based on self-attention mechanism and neural network.The network uses a self-attention mechanism to enhance contextual information within a single modality and across time across multiple modalities to obtain embedded representations of different modality strengths.The core of the network is to perform sequence reorganization and modal enhancement on the multimodal sequence data when processing multimodal data and directly extract the fusion feature representation of the multimodal sequence after information enhancement.Aiming at the fact that feature vector splicing cannot solve the problem that the meaning of the text is dynamically transformed by visual and sound modalities,model 2 is proposed:a multi-modal encoding-decoding network based on Transformer.The network model encodes multimodal data through a BERT network and a Transformer encoder to resolve long-term dependencies within modalities.Moreover,when we perform multimodal data fusion,we use text data as the dominant information,voice and facial visual data as auxiliary information as the core entry point,and reconstruct the Transformer decoder to iteratively update the weights of the multimodal data dynamically.The network fully takes into account the long-term dependencies between modalities and the offset effects of voice and facial visual data on textual data.To evaluate the rationality of the model,this paper selects two general multimodal sentiment analysis datasets for experiments and compares them with the state-of-the-art benchmark models.It can be seen from the final data that the model proposed in this paper is significantly better than the benchmark model,and successfully solves the two problems mentioned above.
Keywords/Search Tags:Multimodal sentiment analysis, Attention mechanism, Feature fusion, Neural network
PDF Full Text Request
Related items