Font Size: a A A

Research On Text-to-Image Algorithm Based On Single-Stage GAN

Posted on:2024-05-14Degree:MasterType:Thesis
Country:ChinaCandidate:J MaFull Text:PDF
GTID:2568306926468044Subject:Engineering
Abstract/Summary:PDF Full Text Request
Text-to-Image is a technique for converting natural language text into images using deep learning techniques.Its significance lies in the fact that it can be convenient and helpful for many application scenarios.However,the quality of the generated images needs to be further improved in terms of the generated image effect,especially for single-stage models.Firstly,as the generation process of single-stage models is random,it is difficult to control the generation results.Sometimes the generated samples are different from the expected results or do not match the prior knowledge;secondly,the training of single-stage models is often unstable,and problems such as pattern collapse,non-convergence of training,and unrealistic model output may occur;finally,single-stage models can sometimes only generate a small number of samples and the variability between these samples is limited.If more samples need to be generated,the model may need to be improved.In this paper,we propose a spatial attention and conditional enhancement text-generated image model with an attribute storage mechanism on a simple and effective benchmark model for text-generated images(DF-GAN)to address the problems of semantic inconsistency of text-generated images,instability of training,and singularity of the generated images.The specific contributions are:To address the problem of singularity of the generated images,an Attribute Storage Model is added to the model.Existing sentence embedding methods model textual representations using only limited information in a sentence,creating the following obstacles:it leaves out some key attribute descriptions,which are key factors in accurately describing an image.Therefore,the attribute store model is an effective approach to text representation based on sentence attribute information complementation.The attribute store can store attribute information of data samples,such as the colour,shape and texture of an image,etc.This attribute information can be used to control the generator to generate image features.In addition,fitting the image distribution from the text distribution,increasing the diversity of visual features and expanding the representation space by adding an extra layer of Affine blocks to the original DF-Block module.In order to improve the stability of the training process and increase the diversity of the generated images,a Conditional Augmentation Model is added to the original model.The Conditional Augmentation Model improves the diversity of the generated images,stabilises the training process of the network,and makes the implied conditions smoother.To improve the semantic consistency between text and synthetic images,a Spatial Attention Model is added to the discriminator.Not all regions in the image contribute equally to the task.Only regions that are relevant to the task need to be attended to,such as classifying the subject of the task.The spatial attention model is looking for the most important parts of the network to process.By transforming and adjusting local regions of the input image,a more fine-grained processing and representation of the spatial information of the image is achieved.Experimental results show that the Spatial Attention and Conditional Augmentation textgenerated image models with Attribute Storage Mechanisms produce images with significantly improved picture quality.IS improved by 2.5%and 3.2%on the CUB and Oxford-102 datasets respectively,and FID reduced by 25.9%and 13.3%on the CUB and COCO datasets respectively.This demonstrates that the Spatial Attention and Conditional Augmentation text-generated image models with Attribute Storage Mechanisms produces images that are more diverse and closer to the real image.
Keywords/Search Tags:Text-generated images, DF-GAN, Conditional Augmentation Model, Affine Block, S patial Attention Model, Attribute Storage Model
PDF Full Text Request
Related items