Font Size: a A A

Research On Text To Image Generation Based On Deep Neural Network

Posted on:2022-10-01Degree:MasterType:Thesis
Country:ChinaCandidate:K SunFull Text:PDF
GTID:2518306740982829Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Text to image generation is a new research task in recent years,which aims to generate real and semantically consistent images from neural language.As a cross modal research problem,it involves two research topics: natural language processing(NLP)and computer vision(CV).There are a wide range of application scenarios for the task,such as computer-aided design,intelligent medical and news illustration generation.It has become a hot issue in research community currently.Early attempts on image generation trying to generate images based on random noise,which were unable to control the content of the generated image.With the application of Conditional Generation Adversarial Network(CGAN),most of the text to image generation current methods are based on CGAN,which use text as conditional vector to constrain the image generation.At present,there are some beneficial exploration on text to image generation methods and some consensus has been formed,such as the text encoder pretrained on multimodal corpus,the introduction of stacked image generation strategy and the fine-grained alignment of text and image by attention mechanism.However,there are still some problems in the current text to image generation methods.First,most of the current mainstream methods rely on text-image pairs,and the acquisition of training data requires a lot of manpower and material resources,which is not realistic for many application fields,making the current text image generation method difficult to promote to other fields.Secondly,most of the current text to image generation methods are based on the end-to-end training mode,which only uses the corresponding relationship between text and image,and ignores other possible attribute information,which leads to the poor interpretability of the model.This thesis focuses on the above two problems.The main work of this thesis are as follows:1.An image generation method based on Cycle Generation Adversarial Network(Cycle GAN)is proposed to solve the task under unsupervised condition.The current text to image generation methods are based on text-image pairs to train end to end models,and the difficulty of obtaining annotation data seriously limits the application scenarios.To solve this problem,in this thesis,we introduce the idea of cycle reconstruction into the text to image generation model,and propose Cross Modal Cycle GAN(CMCG).In CMCG,the text encoder is pretrained on the larger multi-modal corpus,and the authenticity of the generated image is judged with the adversarial loss generated by the random image from the dataset.The model aligns the semantic features of the text and image through two cycle reconstruction losses.Experiments on CUB and Oxford Flower datasets show that the proposed methods achieve better results in image quality and also has a good acquisition in the consistency of image and text.2.A text to image generation model with fine-grained attribute information fused is proposed,which attempts to improve the quality of generated images by fusing fine-grained attributes ignored in traditional image generation models.Existing text to image models are trained only on text-image pairs,and they ignore the fine-grained attribute information of the image itself.However,the attribute information also plays an important role in improving the quality of image generation.To solve this problem,in this thesis,we propose an Attribute Fused GAN(AF-GAN)‘.AFGAN encodes text information,category information and fine-grained attribute information respectively,and transcodes the fused information into condition vector by capsule network to generate images.For the shortcomings of previous image semantic evaluation,this thesis proposes a Fine-grained Attribute Matching Score(FAMS)for fine-grained attribute evaluation.FAMS is used to measure the coarse-grained consistency between the generated image and its corresponding text,and also can intuitively reflect the learning ability of the model for attribute information.Experiments on CUB dataset show that the introduction of fine-grained attribute information can improve the quality of generated image and the acquisition of semantic information,and the interpretability of the model is improved.
Keywords/Search Tags:Text-to-image Generation, Generative Adversarial Network, Unsupervised Learning, Attribute Information Fusion
PDF Full Text Request
Related items