Font Size: a A A

Image Translation Based On Generative Adversarial Networks

Posted on:2023-08-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:R LiFull Text:PDF
GTID:1528307025964829Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
In recent years,computer vision has gained wide attention with the rapid development of computing devices and the improvement of computing power.Image translation is an emerging field in computer vision,aiming at learning the relationship and realizing the mapping between different visual domains.Research on image translation is widely used in various aspects of social production and life.For example,image translation can be used to generate images or videos in specific styles for the visual effects in TV and movies,which may greatly reduce the time on manual drawing.In the field of intelligent traffic and security surveillance,images can be converted from the input field to a label field to better locate people or vehicles.In the field of medical imaging,the translated images can better locate lesion areas.At present,exploring algorithms with high-quality generated images is the main research direction of image translation.Image translation needs to maintain the content of source domain images while learning the characteristics of target domain images.However,current image translation algorithms may suffer from problems,including loss of important details,weak acquisition of effective information,ghosting artifacts when fusing multiple input information,and illumination distortion.In recent years,the rapid development of Generative Adversarial Networks(GAN)has provided new ways for image translation.GAN has a high degree of freedom in network design and shows excellent performance in generating diverse and interpretable data.This dissertation chooses GAN as the basic framework and conducts related research to address current problems in image translation.First,this dissertation designs a single-input saliency detection-based image translation algorithm to address the problem of loss of important details.Then,this dissertation introduces the image reorganization as a pre-training task to perceive the image content information.After that,the single-input is extended to multi-input.An image fusion-based multi-input image translation algorithm is proposed to solve the ghosting problem and an intrinsic decompositionbased multi-input image translation algorithm is designed to solve the illumination distortion.The following are the main works.First,in order to address the problem of loss of important details,saliency detection is introduced into the image translation to ensure that the generated images have good detail preservation on salient regions.Based on the classical GAN framework,a saliency detection network is introduced in parallel with the generator to produce the saliency mask.This saliency mask optimizes the saliency network on the one hand,and constrains the generator to keep the saliency information of the generated image on the other hand.Compared with the classical image translation algorithms,the FID(Fréchet Inception Distance)metric is improved by about 15%.Second,in order to address the problem of weak acquisition of effective information during image translation,this dissertation proposes to apply image reorganization as a pretraining task to perceive image content information,and migrates the parameters of image reorganization network to the image translation network to improve its performance.As for the image reorganization network,a dual-branch framework based on GAN is designed to focus on local and global information,respectively.The framework can effectively reorganize the shuffled image patches.After that,the image translation network based on the image reorganization network parameters is trained.The experimental results demonstrate that migrating the parameters of image reorganization network to aforementioned saliency detection-based image translation framework yields about 8% improvement of the FID metric.Third,in order to address the ghosting artifacts when fusing multiple input information,a multi-input generator and a discriminator based on the minimum pooling module are designed to ensure that the network obtains more scene information from multiple inputs while effectively dealing with the ghosting artifacts caused by dynamic objects in the scene.The generator achieves the mapping from multiple inputs to a single output,and the output is more informative than any input.The discriminator based on the minimum pooling module solves the blurring and ghosting artifacts caused by dynamic objects in the feature fusion process.Compared with aforementioned single image reorganization-based algorithm,the multi-input method achieves 11% improvement of the FID metric.Compared with the classical multi-input image translation algorithms,the framework achieves the highest PSNR values.Fourth,in order to address the problem of illumination distortion,a multi-input image translation framework based on intrinsic decomposition is proposed to enable the network to focus on the scene illumination information better.The loss of scene illumination information will generate unreasonable outputs.The intrinsic decomposition algorithms decompose the image into a reflection map and a shading map,where the shading map can effectively represent the highlight and shadow formed by the illumination information.The architecture first uses two encoders to extract the features of the reflection map and shadow map,and later designs two decoders to recover the features into the target image and the illumination map,respectively.The illumination map is used to preserve the highlight and shadow information of the generated images.Compared with aforementioned single image reorganization-based algorithm,the FID metric is improved by about4%.Compared with the classical multi-input image translation algorithms,the FID metric is improved by about 10%.This dissertation applies the GAN to fit the data distribution of the source and target domains,and solves the key problems of image translation algorithms.This dissertation designs high-quality image translation algorithms,which provide a new exploration direction for image translation research.
Keywords/Search Tags:Image Translation, Deep Learning, Generative Adversarial Networks, Detail Preservation, Information Fusion
PDF Full Text Request
Related items