Research On Generative Adversarial Network Based Cross-modal Image Generation

Posted on:2021-01-08

Degree:Master

Type:Thesis

Country:China

Candidate:K Yue

Full Text:PDF

GTID:2428330614971223

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Image generation,as an important issue in the field of machine learning,has attracted more and more attention with the rise of generative adversarial network.In addition to supplementing some missing data and producing the approximation results of the real data,this generative model can also be used to verify the expression ability of the model for high-dimensional probability distribution problems,deal with multi-mode output problems and reinforcement learning.The conventional single-modal image generation task is achieved with random noise or basic image as input.However,the flexibility of the network obtained in this way is actually limited,which can only manipulate images along one or more fixed patterns.In contrast,the introduction of text information can bring more flexibility to the network.By associating the semantic information with image features,the images can be changed as the given text description changes.And compared with the single-mode image generation,in addition to the network structure design itself,it is also necessary to consider that how to map the data of two different modes to each other in semantic image generation.Currently,the existing models for text-guided image manipulation are inferior in the quality of the generated images,and resolution is low as well.To address this issue,we put forward new solutions in this research.And the main work and contributions are shown as below.First,we present a new progressive cross-modal image generation model.The previous generators of different methods all directly combine the visual features that extracted from original image with the semantic features.And then they feed the acquired features into several residual blocks for transformation.As for the discriminator,they use different loss functions to improve the training effect.However,as the resolution of the image increases,the training of the network will become more difficult as well.Therefore,we consider using progressive method to gradually manipulate the image from low resolution.In addition,we also use our proposed union module to remain more original image details during manipulating.And the experimental results on the fine-grained image dataset in this research can fully prove the effectiveness of our method.Second,due to the lack of text descriptions for fine-grained image dataset,we combine the bilinear attention pooling method to improve the unified cross-modal visual semantic embedding model.In this way,the improved model can not only enhance the mapping relation between fine-grained visual feature and semantic feature,but also reduce the interference from the background image.And the improved model can generate corresponding image according to the semantic information correctly under the condition of small sample to a certain extent as well.In order to verify the effect of our method,we also make a detailed comparison with the original method,and the superiority of the improved method is fully verified in this paper.

Keywords/Search Tags:

Generative Adversarial Network, Image Generation, Semantic Image Synthesis, Cross Modal, Visual-semantic Embedding

PDF Full Text Request

Related items

1	Research On Cross-modal Image Modification Method Based On Generative Adversarial Network
2	Research On Semantic Image Generation Model Based On Generative Adversarial Network
3	Research On Text To Image Technology Based On Generative Adversarial Networks
4	Research On Cross-modal Image Generation Based On Generative Adversarial Network
5	Reserach On Recommendation System Based On Cross-modal Semantic Mining And Generative Adversarial Networks
6	Research On Cross-modal Semantic Relationship Based Image Synthesis
7	Research On Text Description Image Generation Based On Generative Adversarial Network
8	Research And Implementation Of Vision-touch Cross-modal Algorithm Based On GAN
9	RGB-to-NIR Image Translation Using Generative Adversarial Network
10	Image Semantic Analysis Based On Generative Adversarial Network