Font Size: a A A

Image-to-Image Translation Based On Generative Adversarial Net

Posted on:2024-04-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2568307079970899Subject:Electronic information
Abstract/Summary:PDF Full Text Request
In recent years,the field of computer vision has made great progress with the help of hardware computing power and deep learning algorithms.Many traditional vision tasks that are difficult or even unimaginable to solve have been solved by the development wave of deep learning algorithms.Image translation,as one of the most widely used applications,has made significant progress.In this article,we will conduct a systematic research on two typical tasks in the field of image translation,namely instance-level imageto-image translation and text-to-image translation.Specifically,instance-level image-toimage translation is a task that transforms individual instances in an image from one category to another category while preserving the original background,and ensures that the generated new image has good visual effects.Text-to-image translation is a task that generates an image corresponding to the input text information.After conducting research on relevant domestic and international studies,this paper focuses on instance-level image-to-image translation in complex scenes and text-to-image translation in e-commerce scenarios.In response to the insufficient variation in shape and problem representation in the former,as well as the issues of existing technologies not being well-suited for e-commerce scenarios and incurring high computational costs in the latter,this paper proposes a comprehensive optimization design from the perspectives of model design,construction of supervisory information,and component optimization.In short,the innovations of this thesis can be summarized into the following three aspects:For the unsupervised instance-level image-to-image translation task,this thesis proposes a mask-guided deformable instance-level image translation algorithm MGD-GAN++based on further exploring and utilizing the guidance role of mask information in the generation process.The algorithm is designed based on MGD-GAN algorithm.On the basis of MGD-GAN dividing the task into mask deformation and mask-guided image generation,a new supervision data construction method-alignment supervision is introduced for the mask deformation stage,which further makes full use of the mask information in the dataset? In terms of mask-guided image generation,a mask-aware discriminator that focuses on foreground is introduced from the perspective of model design,which further improves the model’s foreground image generation ability.For text-to-image translation tasks in e-commerce scenarios,this thesis designs a novel modular text-to-image translation model LM-TIM based on large-scale pre-trained language models.The algorithm constructs a unique text feature extractor according to the unique text style in e-commerce scenarios.Based on the powerful text content understanding ability of large-scale pre-trained language models and combined with high-quality image generation ability of diffusion models,it completes overall algorithm design.And functionally adopts modular design to reduce single-stage task difficulty while optimizing overall process.It reduces training time cost while making overall function easy to adjust and expand.This thesis conducts sufficient experiments on multiple mainstream datasets and shows performance advantages of this thesis’ s methods from multiple perspectives such as qualitative visual effects and quantitative metrics.
Keywords/Search Tags:Generative Adversarial Networks, Image-to-Image Translation, Self-Supervised Learning, Diffusion Model, Modular Design
PDF Full Text Request
Related items