Font Size: a A A

Cross-Modal Emotion Analysis Based On Semantic And Spatio-Temporal Dynamic Interaction

Posted on:2024-05-16Degree:MasterType:Thesis
Country:ChinaCandidate:L Y XiFull Text:PDF
GTID:2568307157477494Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet and social media,the data containing multiple modalities has shown an explosive growth trend.Increasingly,users rely on social media platforms to express their emotions.Accurate sentiment analysis of media data contributes to government agencies mining public opinion and businesses obtaining user feedback,thereby making more informed decisions.However,traditional sentiment analysis has the problems of insufficient intra-modal feature mining and poor inter-modal interaction.To fully use the diverse features of multimodal data and investigate the mechanisms of inter-modal interaction,this paper proposes a cross-modal semantic and spatio-temporal dynamic interaction network.By learning fully dynamic fusion features within and between modalities,to improve the accuracy of cross-modal sentiment analysis.The main research work is as follows:(1)Multimodal data analysis and feature extraction.The sentiment classification accuracy of media data is affected by the feature extraction method,and different types of media data have different attributes and structures,so different feature extraction strategies need to be used.Through experimental analysis and comparison,the paper selects BERT based on pre-training mode to extract word vector features of text modality;uses Res Net50 based on network parameter migration to obtain visual features of image modality;extracts audio vector features of sound modality by COVAREP to preferably select the final multimodal feature extractor.The foundation for the subsequent model construction is laid.(2)The cross-modal semantic spatio-temporal dynamic interaction network(SST-DIN)model is proposed.Firstly,by introducing bi-directional long short-term memory Network,the time series features of each modality are mined.Meanwhile,a self-attention mechanism is added to strengthen the weight distribution of features within the modality,and the automatically screened feature matrix is sent to the graph convolutional neural networks for semantic interaction.Then,based on the timestamp,the feature aggregation is carried out,the correlation coefficient of the aggregation layer is calculated,and the fused features are obtained to realize cross-modal space interaction.Finally,the classification and prediction of sentiment polarity are completed by a fully connected neural network.(3)Model optimization and validation.Firstly,the optimal hyperparameter values for the model training process are obtained through parameter optimization experiments.Secondly,we designed a multimodal feature extraction comparison experiment to verify the suitability of the multimodal feature extractor "BERT+ Res Net50+COVAREP" with the proposed model.Then,the proposed model is compared with six models such as QMF-Glove,MTGAT,and DEAN.The results show that the proposed SST-DIN model can effectively improve the accuracy and F1 value of sentiment classification by 4.1%~16.3% and 3.6%~15.6% on CMUMOSEI dataset,respectively.Finally,in the modality ablation experiment,it is verified that the combined modality of “Text+Video+Audio” is better than the combination of single mode and dual mode.In the module ablation experiment,the contribution degree of each module in the SST-DIN model is obtained.The multimodal feature extractor proposed in the paper improves the accuracy of crossmodal joint representation.The cross-modal semantic spatio-temporal dynamic interaction network realizes the interaction and fusion of inter-modal and intra-modal features,and enhances the overall performance of the multimodal sentiment analysis framework,which has important theoretical significance and practical value for multimodal sentiment analysis research.
Keywords/Search Tags:Cross modal sentiment analysis, Multimodal feature extraction, Semantic interaction, Spatio-temporal interaction, Graph convolutional network
PDF Full Text Request
Related items