Font Size: a A A

Research On Image-to-Image Translation Algorithm Base On CycleGAN

Posted on:2020-09-28Degree:MasterType:Thesis
Country:ChinaCandidate:S DaiFull Text:PDF
GTID:2428330590473891Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Image-to-image translation can be defined as the task of translating one possible representation of a scene or object into another,given sufficient training data.Many problems from image processing,computer graphic,and computer vision can be seen as image translation problems.Examples include,but not limited to,image colorization,image segmentation,image super resolution,etc.Conventionally these tasks of image-to-image translation have been tackled separately due to their intrinsic disparities.It is not until the past two years that general-purpose and end-to-end deep learning frameworks have been developed to enable a unified treatment of these tasks,which need paired training examples.It is difficult and expensive to get paired data.Unsupervised image-to-image translation tackles the unpaired setting,but these methods are used to tackle object transfiguration as a general image-to-image task,without investigating unique insights of the problem.The aforementioned approaches are sometimes able to translate the foreground of input images.However,they generally also affect the background in unwanted ways,leading to unrealistic translations,for example,generating texture patterns of the target objects in the background,changing the color of the background.We believe that these problems may be caused by two reasons.One is that the network bottleneck layer causes low-level information loss,the other is that the network itself has no explicit attention mechanism.In this paper,we research image-to-image translation based on CycleGAN.Aiming at the problems in CycleGAN when dealing with object transfiguration,we propose three improvements.We use the skip connection to connect the input and output of the generated network,so that low-level element information can be transmitted to the output,thereby solving background color distortion problem.Based on the understanding and research of U-net structure and residual network,we use the method of performing residual conversion at different resolutions,making full use of the advantages of the residual network and the U-net structure and supplementing each other's shortcomings to improve the conversion effect.Based on the understanding and researching the attention mechanism,the generative network in classical GANs has been factorized as two separate networks: an attention network to predict where the attention should be paid,and a transformation network that actually carries out the transformation of objects,solving the problem of generating the target texture mode in the wrong position and the problem that the background of generated image is inconsistent with that of the input.In this paper,the scale convolution is used instead of deconvolution to suppress the checkerboard effect.We verify our models on two data sets,horse2 zebra and orange2 apple.Compared to the basic model,the skip connection solve background color distortion problem,and it can speak up convergence process,but it still doesn't overcome the shortcoming of the cyclic-consistency hypothesis model,generating the texture pattern of the target object in the background.The combination of the U-net structure and the residual block not only preserves the original background information but also reduces the generation of the target texture pattern in the background area.The way of introducing attention network can preserve the original background information while transforming the foreground better,and can effectively solve the problem of generating the target texture pattern in the background.Using the way of introducing attention network,in the horse to zebra,zebra to horse,apple to orange,orange to apple conversion task the mean Fréchet Inception distance(FID)value is reduced by 14.29,12.73,39.94,and 44.08 respectively than the base model,but in the conversion task of zebra to horse,apple to orange,the mean FID is 19.8,18.76 higher than the Attention guided GAN model in 2018 NIPS.
Keywords/Search Tags:unsupervised image-to-image translation, attention mechanism, CycleGAN, U-net, residual network
PDF Full Text Request
Related items