Font Size: a A A

Research On Multi-stage Text-to-image Synthesis Method Based On Generative Adversarial Network

Posted on:2023-08-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q ZhaoFull Text:PDF
GTID:2558307097994659Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of generative adversarial networks,the image generation techniques have made impressive achievements.As an important area of image generation,text-to-image generation have been continuously applied to crossmodal information retrieval,advertising design,and video games and other related real-world scenarios.The text-to-image synthesis methods based on generative adversarial networks are mainly classified into single-stage methods and multi-stage methods,where the multi-stage methods perform better in image generation.But the multi-stage methods still exist some problems,most of which rely heavily on the initial image layout,and it is difficult to effectively guid e the image generation when processing complex text with conditional information,and these shortcomings eventually lead to blurred edges of the generated image structure and multi-object confusion.In this paper,the research work revolves around the above-mentioned problems.First,to address the strong dependence of the generated image on the initial generated image layout,this paper proposes a text-to-image generation method by spatial attention synergizes with dynamic memory(DMSA-GAN).The method first uses spatial attention to adjust the position of each pixel feature of the blurred image,and then uses memory to select the important word information for content adjustment.In addition,this paper designs multi-information response gates that can dynamically fuse memory information,spatial information and current image information to jointly refine new image features from the spatial dimension and word importance dimension.Experimental results on the CUB dataset show that DMSA-GAN improves 0.11 in the IS evaluation index and 4.2% in the R accuracy evaluation index compared with the current optimal baseline method.Second,to address the problem that complex text is difficult to guide image generation effectively,this paper proposes a text-to-image generation method based on content-aware(DM-PGAN).The method introduces a content-aware loss function based on VGG19 and chooses to activate image features at the relu5_4 layer of the perceptual network.Content-awareness of DM-PGAN is only used to constrain the last generator of the multi-stage text-to-image generation image method,and then encourages the generated image to approximate the real image from the image perspective by minimizing the Euclidean distance between the final generated image and the real image.Experimental results on the CUB dataset and the COCO dataset demonstrate that the DM-PGAN alleviates the uncontrollability of complex text on the generated images to a certain extent.Finally,for the problem of text and image heterogeneous data interaction in the production of new energy vehicles,a cross-modal text-to-image generation system is designed and implemented in this paper.The system uses Spring Boot development framework to show users the research history,classical methods,and classical models,and also shows the generation effect of our method in this paper in a visual way.
Keywords/Search Tags:Generative Adversarial Networks, Multi-stage Text-to-Image Generation, Spatial Attention, Dynamic Memory, Response Gate, Perceptual Layer
PDF Full Text Request
Related items