Font Size: a A A

Multimodal Sentiment Analysis And Intelligent Music Generation Method Of Research

Posted on:2024-08-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y H LiuFull Text:PDF
GTID:2555307103995799Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Multimodal sentiment intelligence research is an important part of the field of artificial intelligence,through the integration and analysis of multimodal data such as text,audio,and video,to complete various downstream tasks that serve users.Early research on sentiment intelligence mainly focused on text sentiment analysis and text abstract generation,but text single modality contain limited information,and text features are difficult to accurately supervise the model due to their own ambiguity.In this case,it is difficult for the model to perceive the real sentiment information of the moment and accurately predict the speaker’s sentiment,and it is even more difficult to generate and recommend relevant content.With the rapid development of multimedia,social information has gradually shifted from text information to diversified multimodal information such as sound and video.These multimodal information store richer sentiment information,forming a hierarchical research system of multimodal sentiment intelligence across fields.The research has been applied in a large number of applications,such as sentiment detection robots,sentiment-related content recommendation and sentiment music psychological medicine.Multimodal sentiment intelligence requires models to have shallow information cognition capabilities and deep information understanding capabilities.This thesis focuses on the research of Multimodal Sentiment Analysis(MSA)for natural language processing tasks and the application of Intelligence Music Context Generation(IMCG)based on multimodal sentiment analysis.However,the existing research on multimodal sentiment intelligence is still limited to text single-modal information,which is difficult to play the role of audio and video modalities,and there is a serious phenomenon of text dominance.Specifically,there are three important problems in the field of multimodal sentiment intelligence: lack of data resources,unbalanced multimodal contributions,and single form of content generation.In order to address these issues,the thesis will carry out the following specific researches:(1)Aiming at the lack of data resources,this thesis constructs CH-SIMS v2.0,the largest fine-grained sentiment analysis dataset of semi-supervised Chinese multimodal(text,acoustic and visual modalities),aiming to study the effectiveness of Chinese nonverbal behavior.In addition,we conduct single-modal and multimodal feasibility analysis experiments for single-modal and multimodal labels in the dataset.Experiments show that the fine-grained labels of the dataset can improve the sentiment prediction ability of mainstream models.This research is the first step in exploring sound and visual emotional cues.This dataset and associated algorithms are open source for use by researchers.(2)Aiming at the problem of unbalanced multimodal contribution,this thesis proposes an acoustic visual mixup consistent: AV-MC framework.Its internal modal mixing module is an enhancement strategy that mixes acoustic and visual modal information from different videos,by constructing potential multimodal contexts and combining them with text information already in the dataset.The model can learn and perceive changes in sentiment prediction in different non-verbal contexts.Experiments show that the AV-MC framework helps the model to further explore emotional cues in acoustics and vision,and paves the way for interpretable end-to-end human-computer interaction applications in real-world scenarios.(3)In order to solve the problem of single music content generation,this thesis focuses on the application of sentiment intelligence and constructs a multimodal sentiment intelligence analysis and music content generation platform.Since it is difficult for independent music content generation to generate music with obvious emotional style,this thesis combines the sentiment elements and musical tonality in acoustic information to design a preprocessing method for melody chord separation on time(MCST)based on the Transformer network,so that the model can generate music content with rich sentiment style.In addition,when secondary processing of the generated musical fragments,a Chord2 Vec chord replacement method is proposed to highlight the sentiment diversity of the music.This work achieved competitive performance on the open-source dataset PianoClassic.
Keywords/Search Tags:Multimodal sentiment intelligent, Multimodal sentiment analysis, Multimodal sentiment analysis dataset construction, Acoustic and visual modality enhancement, Sentiment intelligent music content generation
PDF Full Text Request
Related items