Font Size: a A A

Research On Text-to-Image Generation Based On Generative Adversarial Network

Posted on:2022-08-02Degree:MasterType:Thesis
Country:ChinaCandidate:L C LiaoFull Text:PDF
GTID:2518306524480644Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Text-to-image generation is a cross-modal task.The content relevance and consis-tency between the word-by-word meaning of the text description and the semantic infor-mation of the image sub-regions are the core issues of this task.Generative Adversarial Networks(GAN)provides an amount of energy to text-to-image generation technology,derived from the success in the field of image synthesis-text-to-image generation tech-nology.The text-to-image generation models that carry out technological innovation from different angles are emerging one after another.Text-to-image generation mainly revolves around three major technical issues:se-mantic understanding,image generation,and semantic consistency.Taking these three issues as the direction,this thesis deeply explores the advantages and shortcomings of current text-to-Image generation models based on GANs,then proposes reasonable and efficient modification methods,and finally designs a model structure with superior per-formance.The main research results of this thesis are as follows:(1)An enhanced multi-stage attention text-to-image generation model is proposed,called E-AttnGAN.This thesis gives an improvement plan to the existing multi-stage gen-eration model from the aspects of semantic consistency enhancement and image quality enhancement.Aiming at the former,a Dual Attention Structure(DAS)based on the spatial attention between image sub-regions and text words and channel attention between chan-nels of feature map and text words is first proposed,which is to strengthen the semantic association between images and text.Then,a Conditional Patch Discriminator(CPD)is proposed to ensure the semantic matching of each regional block of the generated image with the text and the reality and naturalness of the regional block itself.Aiming at the lat-ter,a multi-stage image generator based on the residual structure is proposed to reduce the learning burden of the network,and introduces an Dense Feature Perceptual Loss(DFPL)to control the randomness of the image irrelevant to the text.(2)A style-based single-stage attention text-to-image generation model is proposed,called Style-AttnGAN.This thesis simplifies the multi-stage generation model,combines the advantages of the style-based StyleGAN model generator,and redesigns it as a condi-tionally controlled text-to-image generation network.The model adds a Region-Word At-tention(RWA)between image sub-regions and text words to the generator to strengthen se-mantic relevance,and introduces a Matching-Aware Zero-Centered Gradient Penalty(MA-GP)to smooth the loss surface on the objective function to ensure the semantic consistency of the image and text.At the same time,a One-Way Discriminator(OWD)is built to ef-fectively cooperate with the implementation of MA-GP term.For the lack of constraints about image content,a Visual Feature Matching Loss(VFM Loss)focusing on image qual-ity is proposed.This thesis constructs a great many of experiments on CUB and MS-COCO datasets for the above two models.Through the comparison of objective evaluation metric values and the subjective analysis of the details in generated images,the validity and superiority of the model proposed in this thesis are proved.
Keywords/Search Tags:generative adversarial networks, text-to-image generation, attention mecha-nism, residual structure, computer vision
PDF Full Text Request
Related items