Font Size: a A A

Research On Key Technologies Of Text-to-image Generatio

Posted on:2024-04-18Degree:MasterType:Thesis
Country:ChinaCandidate:P WangFull Text:PDF
GTID:2568307106981549Subject:Electronic information
Abstract/Summary:PDF Full Text Request
Text-generated images(referring to generating corresponding images based on text descriptions and guiding image editing based on text information)have crossed natural language and computer vision and become a hot research field in recent years,related images or edit existing images according to text semantics.However,there are still many problems to be solved for text-generated images,such as fine-grained enhancement,quality and accuracy of generated images,and text-guided image quality and editorial control.This paper aims to solve the mentioned problems and studies text-to-image generation,and the main works are as follows:(1)The standard normal distribution noise does not provide enough information to synthesize images close to the true distribution.The multi-stage training complicates the textto-image generation process.This study proposes a novel feature-grounded single-stage textto-image generation model.The proposed model first learns the real distribution vector from the training image and takes the vector as the enhanced noise input.The enhanced noise contains more information than the conventional normal distributed noise,which makes the generated images closer to the real distribution.Moreover,the model considers the similarity relationship between the real image,the reference text,and the generated image and introduces the worst-case similarity optimization into the loss function to enhance the generation capability of the model.Finally,the experimental results on two benchmark datasets show that the model achieves a FID score of 19.08 and an IS score of 4.79±0.03 on the CUB dataset.On the MS-COCO dataset,FID achieved an IS score of 27.89 and14.75±0.32,compared with some classical and recent works,the proposed model has better or equivalent performance on image authenticity and diversity metrics,and improves the similarity between generated images,texts,and real images.(2)In 2022,with the development of the diffusion model,the attention of T2 I researchers gradually shifted from the GAN-based model to the diffusion model.However,there are still some problems with the diffusion model,such as the alignment of generated images and semantic text being unsatisfied and the fidelity of the unedited part of the image being lower than that of the original image.This study proposes an attention-enhanced textto-image generation model based on the diffusion model.By injecting attention into the highresolution and bottleneck layers of the model,it enhances the model’s generative capabilities and guarantees the quality of the generated images.Distortion issues of edited images and semantic accuracy issues are alleviated.Finally,experiments were carried out on two large data sets and one face data.The experimental results show that the subjective evaluation of this method on the image generation task has achieved 30.2% accuracy and 37.3%authenticity,and the subjective evaluation on MS-COCO The evaluation text image similarity is 0.191,and the objective evaluation index VQA score on the Celeb A-HQ dataset reaches0.783;In the image editing task,the subjective evaluation achieved 64.8% accuracy and 68.2%authenticity.The objective evaluation index FID obtained a score of 18.96 on face data,and a score of 65.84 on non-face data.All in all,the performance of the model in terms of image quality,accuracy,and reliability has improved.
Keywords/Search Tags:Single-stage Training, Noise Augmentation, Generative Adversarial Networks, Attention Augmentation, Diffusion Models, Image Generation
PDF Full Text Request
Related items