Font Size: a A A

Research On Text To Image Synthesis Based On Generative Adversarial Networks

Posted on:2021-03-27Degree:MasterType:Thesis
Country:ChinaCandidate:Z W XiangFull Text:PDF
GTID:2428330611460717Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Text to image synthesis is a technology that can convert a descriptive text into an image.It not only requires the computer to understand the semantic information expressed by text,but also to convert the semantic information of the text into matching image information.It is a challenging task.However,with the rapid development of Generative Adversarial Networks in recent years,its powerful generation ability for unsupervised learning has quickly gained the favor of researchers.Its unique adversarial thinking has more powerful feature learning and feature representational capabilities than traditional machine learning algorithms,the generative adversarial network has been widely used in the field of natural language and vision,and other fields,all of which provide the basis for the realization of the task of generating images from text.In order to improve the quality of the image generated by the text,this paper improves on the existing text to image model Stack GAN-v2.The innovations of this paper are as follows:(1)Aiming at the limitation of the local receptive field of convolution operator in traditional generative adversarial networks,the model cannot handle long-range,multi-level dependencies well,and a stacked generation confrontation model combined with self-attention mechanism is proposed to allow the model to use the detailed clues of all positions to coordinate the detailed features of each position,and at the same time introduce the feature reconstruction loss in the generator to ensure the semantic consistency of the generated sample and the real sample.Finally,for the training instability problem of the original network model,spectral normalization is introduced into the discriminator,and the Lipschitz constant of the discriminator is constrained by limiting the spectral norm of the weight matrix of each layer of the discriminator to improve the model stability.(2)Aiming at the problem of high-resolution images generated by the original model lacking fine-grained details,a stacked generation adversarial network combined with multi-objective loss is proposed.Taking the image class information loss into the original model allows the model to automatically learn similar features of similar images,to a certain extent,it can make up for the feature information of some high-resolution images that are difficult to capture.And the unconditional loss discriminator of the original model is improved,so that the discriminator input as a lowresolution image branch pays attention to the global structure of the image,and the discriminator input as a high-resolution image branch pays attention to the local details of the image,and in order to prevent the resulting highresolution image has a global distortion problem and introduces pixel loss into the generator.
Keywords/Search Tags:text to image synthesis, Generation Adversarial Networks, self-attention mechanism, multi-objective loss
PDF Full Text Request
Related items