Deep Probabilistic Generative Models Based On Multimodal Variational Inference

Posted on:2023-05-25

Degree:Master

Type:Thesis

Country:China

Candidate:J N Yang

Full Text:PDF

GTID:2568307058963799

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Understanding and representing multimodal data have always been a very important research topic in the field of artificial intelligence.An important branch of research is the use of deep probabilistic generative models to model multimodal data.In recent years,based on the variational autoencoder framework,there are fruitful research outcome for modeling multimodal data.However,because of the inherent characteristics of multimodal data(multi-type,heterogeneity,and redundancy),there still remain many problems to model them.In response to these problems,recent studies have shown that disentangling the shared and private information of multimodal data can effectively improve the performance of the model inference and data generation.Nevertheless,these studies may have the problem that the information of multimodal data has not been accurately extracted.In this regard,this thesis found that the alignment and fusion of shared information are the key factors.Therefore,this paper is conducted by including the method of metric learning and self-supervised learning.The main research results are as follows:For the representation and generation of multimodal data,this paper proposes a self-supervised learning-based disentangling multimodal variational auto-encoder(SD-MVAE)model.This model improves the effectiveness of disentangling and representing data mainly based on the following three treatments: 1)constructing a mechanism of multi-modal data generation by including shared and private latent vectors;2)fusing the shared latent vectors by applying the expert product function;3)aligning the shared latent vectors by using the self-supervised method based on triplet loss.The results of the experiments on the MNIST-SVHN and MNIST-CDCB multimodal datasets show that the SD-MVAE model can effectively disentangle and represent data.The related data representation can significantly improve the accuracy of data cross-generation and translation-generation as well as the quality of image generation.At the same time,it can effectively improve the effect of the model in downstream tasks such as classifying multi-modal data and retrieving cross-modal.Moreover,the SD-MVAE model has many model training parameters,and it is difficult to disentangle and represent different modal data.In response to these problems,this paper proposes a quadruplet metric loss based multimodal variational auto-encoder(Q-MVAE).This model optimizes the model structure and model training objective function,as well as includes a quadruplet metric learning loss.Therefore,with fewer training parameters,it can achieve model performance which is comparable with the SD-MVAE model.The results of the experiments on the MNIST-SVHN and Celeb A datasets show that the Q-MVAE model has good performance not only in data representation and generation performance,but also in downstream tasks.Furthermore,since the model also shows the potential to disentangle,represent,and generate multimodal data more fine-grained,it shows the prospects of applying this model in image processing to a certain extent.To sum up,in response to the problem of representing and generating multimodal data,this thesis proposes corresponding models and algorithms by including the research work of metric learning under the framework of variational autoencoders.These may provide a certain degree of thinking and technical support for the deep probabilistic generative model to process multi-modal data.

Keywords/Search Tags:

Generative Model, Variational Autoencoder, multimodal data, metric learning, Disentangling representaion

PDF Full Text Request

Related items

1	Research And Application Of Probabilistic Generative Model With Variational Learning And Inference
2	Research On Interpretable Variational Autoencoder With Concept Embedded
3	Research On Recommendation Algorithms Based On Improved Variational Autoencoder
4	Research On Deep Generative Models Based On Variational Inference Of Flow Structure
5	Feature Learning Methods Based On Deep Generative Networks
6	High-dimensional Data Anomaly Detection Based On Generative Model
7	Robust Deep Learning For Modulation Recognition Based On Perturbation Generative Model
8	Research On Conditional Generative Adversarial Networks Model Based On VAE
9	Research On Improved Algorithm Of Variational Autoencoder Based On Posterior Approximation
10	Research And Implementation Of Personalized Recommendation Based On Variational Autoencoder