| Multimodal sentiment analysis has become a popular research direction in the field of deep learning in recent years,playing a crucial role in applications such as smart healthcare,smart education,smart customer service and social media analysis.Compared with unimodal sentiment analysis that relies only on a single modality to predict sentiment,multimodal sentiment analysis incorporates the features of different modalities to improve prediction accuracy through the complementarity of different modalities.Some existing multimodal sentiment analysis methods only consider how to integrate modal features,but do not explore the interactions between modalities,which will greatly reduce the performance of the model.Moreover,video,audio and text modalities contain rich information by themselves,and most of the existing research methods neglect to utilize the rich unimodal information,which leads to unsatisfactory prediction results.To address these issues,we design a novel multimodal sentiment analysis framework that learns intra-and inter-modal dynamics using attention mechanisms for modality interaction and improved fusion strategies.Specifically,(1)we introduce a hierarchical cross-modal attention module to model inter-modal dynamics,a bi-modal interaction layer and a tri-modal interaction layer in this module to fuse multimodal features.(2)We design a modal reconstruction module with three modal reconstruction submodules to model intra-modal dynamics.(3)To achieve more reliable predictions,we propose a decision-level fusion sub-work to fuse the inference results generated independently from the above two modules for sentiment analysis.Comprehensive experimental results and comparisons on CMU-MOSI,CMU-MOSEI,and CH-SIMS demonstrate the effectiveness of our model on public datasets.Our proposed model significantly improves the classification performance of multimodal sentiment analysis,especially on CMU-MOSEI,where the proposed model achieves an accuracy of 87.61%.The proposed model has been extensively tested and compared to existing state-of-the-art methods using publicly available datasets,demonstrating competitive performance.These results indicate that the model has significant potential for further exploration and development in the field of multimodal sentiment analysis. |