Font Size: a A A

Text-to-image Algorithm Based On Generative Adversarial Network

Posted on:2024-09-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y R DuanFull Text:PDF
GTID:2568307157472734Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
The main work of text to image task is to convert descriptive text into visual pictures,Generative Adversarial Network(GAN)can realize the transformation from text to image,however,due to the complexity of the cross-modal,the current text to image algorithm based on generative adversarial network still has the problems of generating regions unrelated to text with low quality,generating details that are not good,and lack of mapping relationships between text and images in the initialization stage.The generator and discriminator network are reconstructed separately to address the shortcomings of the existing models.The details of the study are as follows:(1)In order to improve the quality of the generated images and to enhance the generation of detail parts of the network,a multi-level affine combination text to image network(AF-GAN)is proposed relying on the existing multi-stage network.For the former,the text-image affine combination module is added to improve the fine-grained properties of the generated images,strengthen the cross-modal connection between text and image,and record the features of the text-independent regions using bias terms so that the final output can be generated with high quality even for the text-independent regions.For the latter,a detail correction module is added to the network,mainly using word-level features combined with image information,and a spatial and channel attention mechanism module is used to focus on the main feature information to further enhance the details in the synthesized image,where an affine module is also added to refine the missing content of the generated image.(2)In order to solve the problem of lack of mapping relationship between text and image in the initial stage,and the lack of detailed feedback from the discriminator to the generator,an adaptive multi-cascade text to image network(AM-GAN)is proposed.There are three main improvements.First,the cross-attention coding structure is added in the initial image generation stage,and the text information is input to this encoder together with the image,and the crossattention features aligned with the image features are output,reflecting the mapping relationship between text and image to improve the quality of generated images;second,the normalization method uses instance normalization to improve the stability of the trained model;third,the discriminator network uses an adaptive discriminator that returns results to the generator allowing it to capture information about different image regions,thus allowing the generator to perform more detailed generation.For the above models,extensive experiments were conducted on the CUB and COCO datasets to evaluate the quality of the generated samples and to compare the evaluated values.The results show that the evaluation metric IS of the AF-GAN model compared to the previous optimal results the CUB dataset improves by 0.58 and FID decreases by 5.69;the AM-GAN model on the CUB and COCO datasets,the IS improves to 5.51 and 32.51,and FID decreases to 10.21 and 30.06.It proves the effectiveness of the proposed new algorithm.After that,the details of the generated images for each dataset are analyzed to further illustrate the feasibility as well as the superiority of the algorithm.
Keywords/Search Tags:Text to Image, Generative Adversarial Network, Affine Combination Module, Detail Correction, Cross Attention
PDF Full Text Request
Related items