Research On Text-To-Image Generation Technology Based On GAN

Posted on:2024-04-20

Degree:Master

Type:Thesis

Country:China

Candidate:Y J Zhang

Full Text:PDF

GTID:2568307100489094

Subject:Electronic information

Abstract/Summary:

PDF Full Text Request

In real life,people encounter a large amount of visual and textual information through various means.Text-to-image generation technology extracts important feature information based on given textual descriptions.Currently,text-generated images still have some shortcomings in terms of semantics and details,as they fail to achieve consistency between the text and the generated images.However,there is still significant room for improvement in terms of image realism.This paper takes Attn GAN as the benchmark model and conducts an in-depth investigation into the methods of text-to-image generation.The related work of this paper is as follows:(1)Addressing the issue of Attn GAN’s inability to generate fine-grained and realistic images,this paper introduces a gated-channel attention mechanism to drive the generator and employs a multi-stage architecture for generating intricate images.Firstly,the text encoder extracts word features from the textual input.Subsequently,an attention mechanism is utilized to emphasize important word features,thereby enhancing the model’s learning and representation capabilities for feature information and improving the realism of the generated images.(2)Addressing the issue of inconsistency between the images generated at each stage of Attn GAN and the corresponding textual descriptions,this paper introduces a text reconstruction method.It reconstructs new textual semantic content based on the generated images and compares it with the input textual content to enhance the generator’s ability to generate images with the same semantic meaning as the input text.(3)Addressing the problem of semantic and texture mismatch between the given text and the generated images in Attn GAN,this paper introduces the circle loss based on the Deep Attention Multimodal Similarity Model(DAMSM).This loss function minimizes the discrepancy between the given text and the generated images,optimizing the model and reducing gradients to provide a clearer convergence objective.The experimental results demonstrate that the improved Attn GAN outperforms the original Attn GAN in evaluation metrics such as Fréchet Inception Distance(FID)and Inception Score(IS).Consequently,the quality of text-to-image generation has been significantly enhanced.

Keywords/Search Tags:

Text-To-Image Generation, GAN, Attention mechanism, Image generation, circle loss

PDF Full Text Request

Related items

1	Research On Text Guided Image Generation Method Based On Attention Mechanism
2	Research On Semantic Consistency In Text-to-Image Generation
3	Research On Text Guided Image Generation Method Based On Adversarial Learning
4	Text-to-image Generation Based On Feature Alignment And Fusion
5	Research On Text To Image Synthesis Algorithm Based On Stacked Generative Adversarial Networks
6	Research On Key Technologies Of Text Generation In Social Media
7	Research On Text To Image Generation Methods Based On Deep Learning
8	Research On Image Description Generation Algorithm Based On Attention Mechanism
9	Research On Data To Text Generation Based On Deep Learning
10	Study On Image Generation Based On Attention Mechanism