Font Size: a A A

Research On GAN Translation From Sketch To Real Image Based On Perceptual Attention And Latent Space Regularization

Posted on:2021-05-28Degree:MasterType:Thesis
Country:ChinaCandidate:P C WangFull Text:PDF
GTID:2428330620465845Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As a popular research hotspot,deep learning involves various fields such as intelligent search,data mining,natural language processing,image and speech.The use of Generative Adversarial Networks to generate images is a hotly sought research in the image fieldThis thesis studies the translation of sketch images to real-world images.Firstly,the research focuses on the mutual translation between sketch images of natural landscapes and real-world landscape images.The translation between sketches and natural landscape images is one-to-one here.In the process of converting a sketch image into a real landscape image,because the sketch line is simple and the color is single,the converted landscape image has many scenes and complex colors,so this translation is a certain challenge.Based on the translation of sketch landscape to natural landscape,we also studied the translation on several other data sets.At the same time,the translation between sketch images to the corresponding real landscape may correspond to a variety of possible styles.The translation to the real image is not one-to-one,but one-to-many.For example,the same sketch face image corresponds to the real image,and may have different skin colors and hair color.Therefore,on the basis of one-to-one translation,we design the use of latent code vectors to generate a variety of possible outputs to achieve one-to-many translation.In general,we mainly do the following:(1)An automatic translation network was designed,which we named as Sparse Residual Attention Generate Adversarial Networks(SRAGAN)to achieve the translation from sketches to natural landscape images.In the loss function,in addition to generated adversarial loss and the L1 regularized per-pixel loss,we also increased the L1 regularized perceptual loss to reduce the difference between the original image and the generated image perceptually and improve the quality of the generated image(2)In the design of the generator,an encoder-residual block-decoder structure is used.At the same time,the attention mechanism is embedded in the residual block,combining the spatial attention and the channel attention mechanism,combining the feature relationships between image channels and image space,so that the model can focus more "useful" when extracting features.Finally,a "shortcut" connection is added between the input image and the output image to better retain the image features(3)In order to produce a variety of outputs,based on the above SRAGAN structure,this article adds latent code vectors to the input to achieve a one-to-many translation of the input sketch to the output real image,including sketch to bag,sketch to shoe,and sketch to face translation.And our latent code vector comes from two latent space,one is the standard normal distribution,and the other is the distribution generated by the source domain image after passing through the encoder.Only using the latent code vectors derived from the standard normal distribution will cause the generated image to lack the information of the source domain image,and only use the output generated from the source domain image after passing through the encoder,it is not easy to sample the latent code vector during the test time.Therefore,this paper combines two latent code vectors.Because this method is based on latent space regularization and the generate adversarial networks for perceptual attention,this paper names these multi-modal image generation methods as LSRAGAN(4)Based on the original loss function of SRAGAN,we combine multiple perceptual losses to form multiple joint perceptual losses,and more comprehensively calculate perceptual differences.At the same time,aiming at the Model Collapse problem,that is,the problem that the generated images tend to be the same,this paper proposes a penalty method to regularize the generator to increase the importance of the latent code vector.By sample different latent code vectors to generate different and more diverse outputs.we called it latent space regularization losses(5)Extensive experiments show that our proposed method can generate more realistic images,and at the same time,it shows better performance than other related methods...
Keywords/Search Tags:Generative Adversarial Networks, Attention Mechanism, Perceptual loss, Latent space regularization, Multi-model, Image translation
PDF Full Text Request
Related items