Generative Adversarial Network For Text-to-Image Synthesis

Posted on:2021-03-28

Degree:Master

Type:Thesis

Country:China

Candidate:D Y Chen

Full Text:PDF

GTID:2428330623968547

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

The text-to-image synthesis task aims to generate images matching semantically the input sentence,which describes details(e.g.,color and shape)of the object.Because a sentence can match semantically several images with different content,this task is required not only semantics of the generated image and the input text are consistent,but also the content of the generated image is diverse.And all text-to-image synthesis models use the Generative Adversarial Networks(GANs)as the basic framework.However,due to the imperfect theory of GAN itself,this would cause instability in training.Meanwhile,to make the generated images sufficiently realistic and natural,the resolution of the generated images is expected to be large enough,which would inevitably bring a large amount of network parameters and calculations.In this work,we propose the following three algorithms for specific problems:1)Aiming at the problem of the unstable training process,we propose Perceptual Pyramid Adversarial Network(PPAN).This network adopts the pyramid structure to enhance all-scale feature representations.It also adopts the perceptual loss to directly regularize the generated images and the real-world images in the feature space.The above modules are built on the basic hierarchical-nested structure.Observed from the experiments,they not only make the training process more stable,but also improve the quality of generated images.2)Aiming at the problem of the large amount of parameters and calculations in the network,we propose Lightweight Dynamic Conditional GAN with Pyramid Attention(LD-CGAN).This network strives to simplify greatly the structure,but not to reduce the quality of generated images.In LD-CGAN,the information compensation theory is proposed.Specifically,the previous methods only take once semantic information as input.But in LD-CGAN,the input text features are firstly unsupervised semantically disentangled.Then,the proposed Conditional Manipulating Module is used to continuously compensate the disentangled semantics to all-scale features.Compared with PPAN,the amount of parameters and calculations is reduced by up to 80%,and the quality of generated images is not reduced.3)Aiming at the problem of low quality of generated images,we propose Finegrained Perceptual Pyramid Adversarial Network(FPAN).This network adopts the training strategy,from the whole to the parts.Based on the initial high-quality image generated by the Whole Synthesizer,the Parts Synthesizer adopts the word features to enhance local regions in the generated image.And the discriminators in the Parts Synthesizer introduce word-by-word attention mechanism to improve semantically consistency.Therefore,FPAN makes full use of word features to correct and refine generated contents.Consequently,the fidelity,vividness and diversity of images generated by FPAN greatly exceeded results of the state-of-the-art models.

Keywords/Search Tags:

Deep Learning, Computer Vision, Natural Language Processing, Text-toImage Synthesis, Generative Adversarial Network

PDF Full Text Request

Related items

1	Research On Text-to-Image Generation Technology Based On Generative Adversarial Network
2	Short-Spoken Language Intent Classification With Conditional Sequence Generative Adversarial Network
3	Research And Implementation Of Generative Adversarial Network
4	Research And Implementation Of Text Adversarial Example Generation Method
5	Research And Implementation Of Algorithm For Calculating Correlation Between Image And Text Based On Deep Learning
6	Design And Implementation Of Text Data Enhancement System Based On Generative Adversarial Network
7	Research On Adversarial Examples For Chinese Text Classification Models
8	Research On Adversarial Attack And Defense Against Natural Language Processing System
9	Research On The Generation Of Independent Poster Based On Generative Adversarial Network(GAN)
10	Research On Deep Reinforcement Learning Based Text Adversarial Attack Method