| Sentiment analysis is one of the hot topics in natural language and image and video processing research.In recent years,with the development of artificial intelligence and the arrival of the era of big data,social media such as blogs,Weibo,forums,e-commerce platforms and various news Websites have become an important platform for people to express their emotions and obtain information,resulting in massive amounts of text,pictures and videos with personal emotional tendencies.Research on these data has a positive effect on improving the service quality of the platform,helping merchants to better promote and sell products,monitor public opinion,and make personalized recommendations.The earliest researchers mainly focused on the study of text data.Compared with single modality,multimodal data such as pictures can provide more dimensional information supplement for sentiment analysis,so more and more researchers began to consider improving the performance of sentiment analysis through richer sources of sentiment information.For the cross-modal sentiment analysis research based on deep learning,the main work of the thesis is as follows:(1)Propose an aspect-level text sentiment analysis method based on syntactic features and selfattention mechanism.Firstly,the word vector representation of the input text sequence is extracted and sent into Bi-LSTM network to obtain the implicit state representation of the text.Then,on one hand,syntactic dependencies between words are established,and syntactic features are extracted by gated graph convolution network and local average pooling based on aspect words.On the other hand,the implicit state expression is sequentially through location coding,multi-self-attention mechanism and global maximum pooling layer to extracte the context information and key emotion words.Finally,the extracted features are fused and classified.Experiments show that the proposed network model is superior in both accuracy and Macro F1,and the importance of low-level text feature representation for sentiment classification is proved by using two different text word vector representation methods.(2)Propose an image sentiment analysis method based on multi-scale feature fusion and attention mechanism.The method firstly uses Res Ne Xt101 network to obtain the features of different convolutional layers of input images,and then adjusts the size of the feature by downsampling and other operations before splicing and fusion.Secondly,we use the dual attention mechanism module to guide the selection of emotional features from two independent dimensions of spatial and channel.Finally,the information is extracted,integrated and classified through the global average pooling operation and the full connection layer to carry out the final sentiment recognition.Experimental results show that this method has excellent performance on three public datasets.(3)Based on the research of single-modal sentiment analysis,a cross-modal sentiment analysis algorithm for image-text fusion based on hybrid fusion strategy is proposed.An advantage of this approach is that when a multimodal data source is incomplete and a modality is missing,the analysis can be continued using other modalities.First,the spatial attention module of the image sentiment analysis network is improved to incorporate aspect word information.Then extract the image and text features separately,fuse them in the feature-fusion layer,and design the loss function,using the KL divergence loss to increase the penalty factor for the different modal features,so as to obtain the consistent classification results for different aspects of the word.Finally,the classification results of single-modality and feature-fusion layer are fused at the decision-making layer to obtain the final cross-modal sentiment classification result.The experimental results demonstrate the effectiveness of the method,which can improve the accuracy of emotion recognition by fusing data from different modalities. |