Font Size: a A A

Emotion Understanding And Generation With Multimodal Correlation

Posted on:2022-03-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:G Y ShenFull Text:PDF
GTID:1488306746457694Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,with the development of multimedia technology,the modalities of network data have become increasingly complex and expressive.On the other hand,research in the field of artificial intelligence has made great progress in imitating human physiology,while research on the psychological level is in the ascendant and has a lot of room for development.If machines can accurately understand and express emotions in multimodal data,not only will it have significant interdisciplinary research significance between psychology and computer science,but it will also greatly improve the user understanding and content creation in practical applications.At present,related work has defects such as insufficient dataset universality,low relevance of multimodal emotional feature,and weak robustness and interpretability of multimodal fusion and generation,which cannot meet the needs of complex environments in reality.Therefore,this thesis aims to research on emotion understanding and generation with multimodal correlation.In this thesis,we establish multiple standard datasets and benchmarks,extract multimodal emotion features,and construct machine learning models under the guidance of psychology.In this way,we capture and enhance the correlation between modalities and achieve better recognition ability for user emotions,robust reasoning ability of content emotions,and stronger interpretability in emotion data generation,providing a research foundation and future directions in related fields.The main contributions of this thesis are summarized as follows:· We propose a multimodal user depression recognition method on social media.Inorder to capture the multimodal correlations under the social media big data and im-prove the understanding of user emotions,this thesis constructs a high-quality large-scale depression user dataset on social media and extracts multimodal features thatare highly related to emotions under the guidance of psychological theory.We usemultimodal dictionary learning algorithm to obtain the multimodal joint sparse rep-resentation for further emotion classification,which achieves a good performancein depression recognition.In addition,we excavates some typical online depressionbehaviors on large-scale depression users.· We propose a multimodal human emotion reasoning method based on video data.In order to capture and enhance the correlation between modalities when the modalinformation is incomplete and improve the robustness of emotion understandingof contents,this thesis constructs a large dataset for multimodal human emotionreasoning in videos,which provides the person-level emotion annotations in thescenario of modality absence.This thesis proposes a multimodal emotional reason-ing model based on the self-attention mechanism.While performing multimodalfusion,it also considers reasoning strategies such as emotional communication andemotional context,achieving good performance on this data set.This work wouldsupport the development of more advanced emotion reasoning algorithms.· We propose a controllable expression video generation method,where the intensityof frames are self-inferred.In order to take into account the robustness,controllabil-ity and interpretability in emotion generation,this thesis proposes an intensity-basedvideo generation method,which generates expression videos from a single neutralface to capture the multimodal correlation between images and videos.The high-light is that in training the expression intensities of each frame can be automaticallyinferred,so that we can avoid complicated and inaccurate manual intensity label-ing.In addition,we provide a unified generation model of multiple expressions forusers.This method can provide convenience for public creation and enhance theoverall vitality of the Internet.
Keywords/Search Tags:Affective Computing, Multimodal, Video Processing, Social Media, Machine Learning
PDF Full Text Request
Related items