Font Size: a A A

Conditional Image Generation Method Based On Generative Adversarial Network

Posted on:2022-02-19Degree:MasterType:Thesis
Country:ChinaCandidate:Z H WangFull Text:PDF
GTID:2518306494969059Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the continuous development of deep learning neural networks,image generation tasks have also made great progress.The image generation task started with generating real images from random noise,and then began to generate images based on certain conditions,such as labels or text descriptions.Different from the label,the text description is a complete sentence.Moreover,there is a huge semantic gap between the text and the image.Therefore,it is very difficult to generate realistic and semantic images from the text description.Recently,researchs have been conducted to try to clarify the meaning of image features in each dimension in the vector space.So,the task of generating images by layer after feature entanglement has been developed.As a relatively new research direction,there are also certain difficulties and challenges in the process of development.Therefore,this paper on image generation mainly focuses on two aspects: the task of generating images based on text description and the task of decoupling characterization and layering.At present,generative confrontation networks have become the main generative model used in image generation tasks.While most of them are based on convolutional neural networks.For text-to-image tasks,most text-to-image generation models can only rely on local features when convolving low-resolution feature maps to generate high-resolution feature maps.It is not conducive for the model to capture the longdistance dependent information in the text description.When the text contains too much information,it is difficult to fully present it in the image.To solve this problem,this paper proposes a pixel-word collaborative attention module,which can use pixel-level self-attention modules and word-level attention modules to gradually refine images from pixels to text.At the same time,the pre-training model BERT is fine-tuned to extract word embeddings,which improves the generalization ability of the model to a certain extent.A lot of experiments have been done on the CUB and COCO data sets,and the IS score and R-precision score are used to evaluate the model in this paper.The experimental results verify the effectiveness and superiority of the model in this paper.For the task of decoupling characterization and hierarchical generation of images,a common method is to associate different condition vectors with different features,and use multiple different action generators to achieve feature layering.The realization of the separation of image background and foreground is an example.However,for most models,it is difficult for people to know the specific features represented by each dimension of the input vector before the experimental results come out.In response to this problem,this paper improves the decoupling characterization and hierarchical generation of the image model.The color can be changed and generated as specified on the basis of the separation of background,posture and color.Therefore,this article changes the condition vector responsible for the color to the one-hot code of the set color category and adds a color classifier to the discriminator responsible for the color to classify the generated image into the color category and calculate it with the real category error.By adding a color category loss,the model can generate the required color according to the set code.Moreover,a self-attention module is added to improve the quality of the generated image when layering features.In order to verify the effectiveness of the model in this article,ablation experiments and comparative experiments were carried out on the CUB and Cars data sets.And the IS and FID scores were used for evaluation,which verified the effectiveness and advancement of the model in this paper.
Keywords/Search Tags:Image generation, Generative adversarial networks, Text to image generation, Image feature disentanglement
PDF Full Text Request
Related items