Font Size: a A A

Research On Multimodal Emotion Analysis Method Based On Deep Learning

Posted on:2024-08-27Degree:MasterType:Thesis
Country:ChinaCandidate:S DongFull Text:PDF
GTID:2568307094959399Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the explosive development of social media and short video platforms,many people use video to record and share their views and opinions on various topics,leading to a large amount of multimodal data.Multimodal data typically contains three modalities: text,audio,and visual information.Each modality has its own unique emotional features,and the different modalities can also affect and correlate with each other,providing important supplementary information for accurate emotion classification.Therefore,how to effectively integrate these three types of information to achieve accurate classification of specific emotional types is a topic worth exploring.This article selects videos uploaded by users on social media websites as the research object.In order to solve the problems of ineffective integration of multimodal features and low accuracy of emotion classification in multimodal emotion analysis tasks,the article fully utilizes single-modal irrelevant information,contextual information,and multimodal interactive information,and constructs a multimodal emotion analysis model with the help of deep learning technology.The main research contents are as follows.(1)This paper proposes a multimodal sentiment analysis model that combines temporal convolutional networks with hierarchical fusion at the feature level.The model employs a composite hierarchical fusion mechanism,where the video,audio,and textual modalities are first dimensionally aligned and mapped to the same feature space.Subsequently,pairwise fusion is performed between different modalities,resulting in three sets of bimodal features.These sets of bimodal features are then fused together to create a feature matrix that contains rich multimodal information.A residual operation is applied to integrate the original unimodal features with the multimodal information,generating a final multimodal feature matrix for sentiment orientation analysis.After each fusion step,temporal convolutional networks are utilized to process the features.Furthermore,a soft attention mechanism is employed to filter out noise and redundant information,thereby improving the model’s accuracy.To validate the effectiveness of the proposed model,experiments are conducted on publicly available multimodal datasets,the results show that the model is effective and progressiveness in multimodal emotion classification tasks.(2)This paper proposes a multimodal sentiment analysis model based on multitask learning and heterogeneous hybrid neural networks.The model addresses the challenges of large-scale and redundant information in video features by utilizing a dual convolutional maximum pooling module.The processed video features are then used for fusion.For textual content,a bidirectional gated recurrent neural network is employed to extract contextual semantic features.For audio modality data,a temporal convolutional network is used for feature extraction.These three different modalities form an overall heterogeneous neural network.After extracting the individual modality information,composite feature fusion is performed.Additionally,a secondary task of speaker gender recognition is incorporated alongside the main task of sentiment classification.The loss value of this auxiliary task is used to constrain the model,further improving its performance.To validate the accuracy of the model,experiments are conducted on publicly available multimodal datasets,demonstrating its effectiveness and accuracy in multimodal sentiment analysis.
Keywords/Search Tags:Multimodal Sentiment Analysis, Deep Learning, Temporal Convolutional Networks, Multimodal Fusion, Attention Mechanism
PDF Full Text Request
Related items