Research On Text-Guided Image Generation Based On Generative Adversarial Networks

Posted on:2024-04-15

Degree:Master

Type:Thesis

Country:China

Candidate:J Yang

Full Text:PDF

GTID:2568307097457094

Subject:Control theory and control engineering

Abstract/Summary:

PDF Full Text Request

It is easier for people to understand vivid image information than abstract and complex text information because image information can better highlight the focus.However,it is more difficult to obtain image information that matches the textual information.The task of textguided image generation requires a combination of two fields,computer vision and natural language processing,and belongs to the cross-field research.The task generates semantic images based on the given text describing details such as shape and color of an object.A single text description can correspond to multiple visual contents with different pixels,so the difficulty of this task is to generate clear,natural and diverse images while matching the semantics of the input text.Currently,the main approach to the text-guided image generation task is to use generative adversarial networks and their improved algorithms to generate high-resolution images using a multi-stage generative adversarial network architecture that progressively generates images of different resolutions.However,this architecture is unstable and time-consuming to train,has a large number of network parameters and computational effort,and the generated images look like a stack of simple information and lack details and realism.In this paper,based on the current development status and problems in the field of text-guided image generation,the main work is as follows:(1)In order to solve the problems of poor visual effect and diversity of generated images and lack of detailed information,this paper proposes a triple attention-based generative adversarial network(TAGAN).The model uses a pair of generators and discriminators,in which the generators combine the triple attention mechanism to continuously extract text features and improve image detail information in the upsampling process,and effectively fuse the two features to generate semantic and clear natural images.On the other hand,to help the generator converge,the discriminator uses a one-way output mechanism to direct the valid results to the real and matching data pairs to provide accurate directions,and uses a match-aware gradient penalty to improve the degree of matching between the generated image and the input text.(2)To address the problems of increasingly complex models of text-generated images with large number of parameters and long training time,this paper proposes a lightweight feature fusion generative adversarial network(LFGAN),where the generative network reuses text information in combination with conditional convolution and dense connectivity during forward propagation and uses text information as a condition to adjust the visual effect of the generated images.Meanwhile,in order to improve the visual effect of the generated image,this paper uses BERT text encoder and perceptual loss function to improve the generator’s understanding of the text information and the matching of the two features,thus enhancing the detail information of the generated image.This model adopts a simple monolithic structure and supplements the missing information during the generation process,so the number of model parameters is greatly reduced while achieving visual effects comparable to those of the comparison model.

Keywords/Search Tags:

Generative adversarial network, Text-generated image, Triple attention mechanism, Match-aware gradient penalty, Conditional dense modulation, BERT text encoder

PDF Full Text Request

Related items

1	Research On Text To Image Generation Algorithm Based On Attention Mechanism And Generative Adversarial Networks
2	Research On Generative Adversarial Network For Text-to-Image Synthesis
3	Research On Text Images Based On Generative Adversarial Network
4	Research On Realistic Image Generation Algorithm Based On Generative Adversarial Network
5	Research On Text-to-Image Synthesis Based On Generative Adversarial Network
6	Research And Application Of Text To Image Algorithm Based On Generative Adversarial Networks
7	Research On Text-oriented Image Editing Algorithm
8	Research On Conditional Image Synthesis Based On Image And Text
9	Research On Text-to-Image Generation Technology Based On Generative Adversarial Networks
10	Research On Text To Image Technology Based On Generative Adversarial Networks