| In real life,people encounter a large amount of visual and textual information through various means.Text-to-image generation technology extracts important feature information based on given textual descriptions.Currently,text-generated images still have some shortcomings in terms of semantics and details,as they fail to achieve consistency between the text and the generated images.However,there is still significant room for improvement in terms of image realism.This paper takes Attn GAN as the benchmark model and conducts an in-depth investigation into the methods of text-to-image generation.The related work of this paper is as follows:(1)Addressing the issue of Attn GAN’s inability to generate fine-grained and realistic images,this paper introduces a gated-channel attention mechanism to drive the generator and employs a multi-stage architecture for generating intricate images.Firstly,the text encoder extracts word features from the textual input.Subsequently,an attention mechanism is utilized to emphasize important word features,thereby enhancing the model’s learning and representation capabilities for feature information and improving the realism of the generated images.(2)Addressing the issue of inconsistency between the images generated at each stage of Attn GAN and the corresponding textual descriptions,this paper introduces a text reconstruction method.It reconstructs new textual semantic content based on the generated images and compares it with the input textual content to enhance the generator’s ability to generate images with the same semantic meaning as the input text.(3)Addressing the problem of semantic and texture mismatch between the given text and the generated images in Attn GAN,this paper introduces the circle loss based on the Deep Attention Multimodal Similarity Model(DAMSM).This loss function minimizes the discrepancy between the given text and the generated images,optimizing the model and reducing gradients to provide a clearer convergence objective.The experimental results demonstrate that the improved Attn GAN outperforms the original Attn GAN in evaluation metrics such as Fréchet Inception Distance(FID)and Inception Score(IS).Consequently,the quality of text-to-image generation has been significantly enhanced. |