Research On Cross-modal Generation Method Based On Cascaded Adversarial Network

Posted on:2023-03-23

Degree:Master

Type:Thesis

Country:China

Candidate:P Wang

Full Text:PDF

GTID:2568306614993649

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of computer technology and social network,massive data are generated all the time in daily life.How to use massive data to realize intelligent tasks has become a research hotspot.In practical application,massive data usually exists in different modalities,such as text,image,video,audio,3D model,and so on.Although the existing forms of massive data are different,the data of different modalities may have high correlation and even describe the same thing.In the current research based on multi-modal data,cross-modal intelligence and its related research utilizing the correlation between different modal data have attracted much attention due to their wide application.As a subtask of cross-modal intelligence,cross-modal generation is widely used in practical scenarios,such as computer-aided design,image editing,machine translation,information digitization,etc.From the current research,the cross-modal generation methods based on deep neural networks are superior to the cross-modal generation methods based on traditional machine learning algorithms and have become the main research direction in the field of cross-modal generation.Cross-modal generation not only generates data from one modal to another,but also requires the generated data to be highly similar to the real data so that it is difficult to distinguish.In the task of cross-modal generation,this thesis takes the cascaded adversarial network as the basic generation framework,and mainly studies the cross-modal generation methods between text,image,and 3D point cloud.The specific research contents are as follows:(1)A text-to-image generation method based on background-induced and multi-level discriminator is proposed.This method combines the cascaded adversarial network and hybrid attention mechanism to construct a multi-stage image generation framework.At the same time,the background image is added into the multi-stage image generation framework as auxiliary information.Under the joint constraint of text description and background image,the method can generate diverse images with different foreground objects under the given background.Besides,the method introduces a multi-level discriminator and the corresponding multi-level discrimination loss to further improve the performance of image generation.Experimental results on the CUB Bird dataset demonstrate the superiority of the proposed method and the ability of image generation under a given background.(2)A cascaded generation method is proposed for dense point cloud reconstruction from a single image.This method combined the pre-reconstruction network with the up-sampling network to construct a multi-stage point cloud generation network.Meanwhile,an image re-description mechanism is designed to optimize the multi-stage point cloud generation network by generating images from the reconstructed point clouds.Besides,this method introduces a Siamese structure to extract the consistent high-level semantics from multiple images to further enhance the semantic correlation between images and reconstructed point clouds.In the optimization process of the multi-stage point cloud generation network,the training difficulty of the network is significantly reduced through phased training and overall network fine-tuning.Extensive experiments on the Shape Net dataset show that the performance of the proposed method is significantly better than that of existing point cloud reconstruction methods.

Keywords/Search Tags:

cross-modal generation, cascaded network, generative adversarial network, text-toimage generation, point cloud reconstruction

PDF Full Text Request

Related items

1	Research On Text Description Image Generation Based On Generative Adversarial Network
2	Dual-channel Consistency Constraint Generative Adversarial Network For Text-guided Image Generation
3	Research On Cross-modal Image Generation Based On Generative Adversarial Network
4	Research On Cross Modal Text Generation Image Based On Generative Adversarial Network
5	Research On Generative Adversarial Network Based Cross-modal Image Generation
6	Text-to-image Generation Based On Feature Alignment And Fusion
7	Research On Text-to-image Generation Algorithms Based On Generative Adversarial Network
8	Research On Text Guided Image Generation Method Based On Attention Mechanism
9	Research On Text Generation Based On Generative Adversarial Network
10	Text To Image Generation Based On Generative Adversarial Network