Research On Text-to-Image Generation Technology Based On Generative Adversarial Network

Posted on:2022-09-23

Degree:Master

Type:Thesis

Country:China

Candidate:S Y Luo

Full Text:PDF

GTID:2518306557970649

Subject:Electronics and Communications Engineering

Abstract/Summary:

PDF Full Text Request

In the era of Internet,people mainly use search engines to retrieve images in the database.However,since a sentence may correspond to many different images,it is difficult to find the desired image.With the breakthrough and innovation of artificial intelligence technology,the task of text-toimage generation also has a certain feasibility.Text-to-image generation is a cross-modal task involving natural language processing and computer vision.The goal of this task is not only to ensure that the generated images are true,but also to ensure that the generated images are consistent with the given text description semantics.In the past few years,researchers have proposed a series of text-toimage generation models based on Generative Adversarial Networks(GAN).However,due to the complexity of the task,the performance of the text-to-image generation model has very large development space.In order to further improve the effectiveness of the model,this thesis redesigned the generator and discriminator of the existing model.The specific research contents are as follows:Aiming at the shortcomings of low quality and mismatching with text description generated by traditional stack generator,this thesis proposes a Semantic Fusion Generative Adversarial Network(SF-GAN).The mainstream text-to-image generation models all use a stacked structure to generate high-resolution images,and this structure easily leads to artifacts in the generated images.Moreover,these models simply concatenate the sentence vector and the noise vector as the input of the generator,and do not make full use of the text description,which easily causes the generated image to not match the given text description.In order to alleviate these problems,the SF-GAN proposed in this thesis uses a streamlined generator structure to generate high-resolution images.Both the Semantic-based Affine Transformation Module(SATM)and the Semantic-based Joint Attention Module(SJAM)in the SF-GAN generator can fully integrate semantic information from channel and spatial dimensions to make the generated image more consistent with the given text description.Experiments on the CUB public datasets prove that compared to the mainstream stacked text-to-image generation model,the images generated by SF-GAN are more realistic and clearer,and more closely match the given text description.Aiming at the disadvantage that traditional global discriminator can not judge the local details of image,this thesis proposes a generative adversarial network based on the encoder-decoder structure discriminator(SF-GAN-V2).The traditional discriminator of text-to-image generation model can only judge whether the global image is true or not,and can not distinguish the local area in the image,which leads to the generated image is not clear and true,although the whole image conforms to the semantics.In order to alleviate this problem,the SF-GAN-V2 proposed in this thesis uses an encoderdecoder discriminator instead of the original discriminator,and in order to integrate high and low level semantics,SF-GAN-V2 uses Skip Connection to connect the encoder and the decoder.The encoder-decoder structure discriminator can not only judge the authenticity of the global image,but also the authenticity of the local area in the image.In addition,in order to further improve the positioning capability of the encoder-decoder structure discriminator,SF-GAN-V2 also uses Cut Mix data augmentation to synthesize images to train the model.Experiments on the CUB public datasets and COCO public datasets show that the images generated by SF-GAN-V2 are clearer and the local details are more accurate.

Keywords/Search Tags:

Natural Language Processing, Computer Vision, Generative Adversarial Networks, Attention

PDF Full Text Request

Related items

1	Natural Language Generation Description Method For Short Videos
2	Generative Adversarial Network For Text-to-Image Synthesis
3	Short-Spoken Language Intent Classification With Conditional Sequence Generative Adversarial Network
4	Research On Text-to-Image Generation Based On Generative Adversarial Network
5	Image To Language:Auto Image Captioning Using Bi-directional LSTM And Deep Attention Neural Networks
6	Research On Improved Method Based On Generative Adversarial Model
7	Research On Facial Expression Generation Based On Generative Adversarial Networks
8	The Application Of Adversarial Networks To Speech And Language Tasks
9	The Research And Application On Generative Adversarial Networks Oriented Towards Computer Vision
10	Research On Game Background Stylization Algorithm Based On Generative Adversarial Network