Font Size: a A A

Research On Text-to-Image Generation Technology Based On Generative Adversarial Networks

Posted on:2023-09-08Degree:MasterType:Thesis
Country:ChinaCandidate:S HanFull Text:PDF
GTID:2568306773459764Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Text-to-image research is the intersection of natural language processing and computer vision.This task requires computers to generate images consistent with the semantics of the text based on understanding the semantics of the input text.The images generated by the variational self-encoder are blurred in the traditional image generation methods,and the generated images cannot fully express the text semantics.This thesis uses a generative adversarial network to solve the problems caused by traditional methods to achieve text-toimage synthesis.Although some researchers have used generative networks to generate images that conform to text semantics,the diversity and clarity of images still can not meet the actual application requirements.This thesis proposes two text-to-image algorithms based on generative adversarial networks based on summarizing existing methods to improve the semantic relevance of text and images and the quality of images.(1)Variable-scale pyramid attention generative adversarial network for text to image synthesisSince the quality of high-resolution images in the multi-stage generative model depends on the quality of low-resolution images,the algorithm focuses on improving the initial image quality.In order to deeply mine the text expression ability of the initial image and promote the image generation,a variable scale pyramid attention module is designed.The module extracts the features of the initial image at different scales,and uses the useful information between feature channels efficiently through attention.In addition,to enhance the feature extraction ability of the refined network,a residual dense feature extraction module is proposed,which makes full use of the features between each layer through dense connection and residual connection,conducts deeper information interaction,refines the initial image,improves the High-resolution image quality.Experiments on the CUB and COCO datasets show that the algorithm has superior performance compared with previous models.(2)Cross-domain feature fusion generative adversarial network for text to image synthesisThe algorithm uses a multi-stage generative adversarial network as the main framework of the algorithm to upgrade the image from low resolution to high resolution.To improve the semantic correlation between the generated image and text description,a feature fusion enhanced response module is designed,which deeply fuses the features of the initial image at low resolution with the word-level vector features so that the image can correctly express the semantics of the sentence based on,it can also accurately express word-level semantics.At the same time,to make the target object in the image complete and rich in texture structure,a multibranch residual module is designed,which replaces multiple residual modules with a simple structure,and fully extracts the texture features of the image making the image more realistic.The experimental results on the CUB and COCO datasets show that the algorithm improves the previous model’s performance by at least 1.5% in terms of Inception Score and R-precision.The above two algorithms are proposed to improve the quality of the image from two different perspectives: improving the quality of the initial image and improving the semantic correspondence between the text and the image,and successfully synthesize the visual image that is consistent with the text semantics.It provides a reference for advancing the subsequent research on text generation image tasks.
Keywords/Search Tags:text-to-image, generative adversarial networks, cross-domain feature fusion, pyramid attention mechanism
PDF Full Text Request
Related items