Font Size: a A A

Research On Text-to-image Generation Based On Multi-stage And Multi-task Generative Adversarial Network

Posted on:2022-09-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y T XueFull Text:PDF
GTID:2518306605465794Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Image generation based on deep learning is one of the researches that have received much attention in computer vision.Text-to-Image Generation is used to obtain images that match the given text using image generation technology,which belongs to the cross domain of computer vision and natural language processing.Due to the differences between the two modalities,the task faces many difficulties and challenges.On the one hand,as the mainstream generative model of this task,the Generation Adversarial Network has the problems of unstable training process,low quality of the generated results,and insufficient diversity.Although some researchers guide the generation process step by step through the multi-stage network,which enhances the stability of training,the cascade structure will cause error accumulation and affect the generated image by the initialization quality.On the other hand,Text-to-Image Generation is a weakly supervised task,and the supervision of the Generation Adversarial Network is biased,which cannot guarantee the semantic consistency of the image and the text.In order to generate high-quality,large-size images,this article explored the following two issues and achieved good results.(1)For the cumulative error of multi-stage Generative Adversarial Network and the affecting by the initialization quality,this paper simulates forgetting behavior through the dual structure of the down-up sampling layer to make the network learning more intelligent.At the same time,this paper modifies the generation process of the test image and regenerates the poor-quality image to ensure the quality of the initial image by introducing a discriminator.The above two methods are integrated into a Multi-stage Adaptive Generative Adversarial Network.Qualitative and quantitative experiments on CUB and Oxford data sets show that the performance of this model is better than others.(2)For the problem of biased guidance in the Text-to-Image Generation,this paper combines multi-task learning strategies to add additional auxiliary task to the discriminator to make the guidance more refined,and effectively improve the semantic consistency between the image and the corresponding text.Experiments show that the Multi-stage and Multiguidance Adaptive Generative Adversarial Network designed in this paper has better performance on CUB and Oxford datasets compared to the network using a single-task training strategy.
Keywords/Search Tags:Generative Adversarial Network, Multi-task Learning, Text-to-Image Generation, Multi-stage, Cross Domain
PDF Full Text Request
Related items