Font Size: a A A

Research On Semantic Consistency In Text-to-Image Generation

Posted on:2022-10-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:H C TanFull Text:PDF
GTID:1488306338984839Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
The aim of text-to-image generation is to use the generator to analyze the semantics of the input text description,and generate images that reflect the text description.The heterogeneity between text and image makes it difficult for the generator to effectively analyze the semantics of the text and generate high-quality images.This makes the task face a huge challenge.The development of cross-media Intelligence and the creation of a large number datasets have brought new opportunities for this task.Therefore,the text-to-image Generation is currently a hot topic in the computer vision and multimedia intelligence areas.This thesis studies the semantic consistency between text description and generated image in text-to-image generation task.The main research work is as follows:(1)To solve the problem of insufficient semantic constraints and non-keywords interference in image detail generation,this paper proposes a semantics-enhanced text-to-image generation algorithm.The algorithm constrains the semantics of the generated image to approximate that of the real image,and filters out non-keywords in the text description.To enhance semantic constraints,a semantic consistency module is proposed.This module first uses the siamese network to extract the global features of the generated image and the real image.Then,use the contrastive loss to restrict:pull a generated image towards its corresponding real image,and pushes the generated image away from another image that is associated with a different text description.Finally,a new sliding loss is proposed to balance the training weights of difficult sample pairs and simple sample pairs in the module.To overcome the interference of non-keyword information on the generation of image details,an attention competition mechanism is proposed.Experiments show that this mechanism can effectively filter out non-keyword attention information and improve the guidance of keywords in the generation of image details.A large number of experiments demonstrate that this algorithm greatly enhances the semantic expression of the generated image.(2)To build a semantic bridge between text and image,this paper proposed a knowledgetransfer text-to-image generation algorithm.The algorithm mainly bridge the semantic gap between text and image,from the perspective of cross-modal semantic distillation.And we propose a new attention mechanism to help the generator adaptively adjust local semantic information.To reduce the hetero-modal semantic barrier between text and image,a cross-modal semantic distillation mechanism is proposed for the first time.This mechanism uses image-to-image generation task to guide the text-to-image generation task to better extract text features and generate high-quality images.In the mechanism,we firstly train the image-to-image generation model;Next,the proposed cross-modal distillation loss constraints text features are consistent with the image features in terms of semantic and category distribution.Finally,the generator and discriminator in the image-to-image generation task are used as the initialization of the generator and discriminator in the text-to-image generation task.To enhance the detail semantics of the generated image,an alternate attention-transfer module is proposed.This module helps the generator adaptively adjust the weights of words and image' sub-regions,and enhance detail semantics of the generated image.Experimental results show that the algorithm can greatly enhance image details and global semantics,and even the objects' layout of complex scenes can be significantly improved.(3)To reduce the complexity of image distribution and help the generator better capture key features,this paper proposes a distribution regularization text-to-image generation algorithm.From the perspective of distribution regularization,the algorithm helps the generator better capture the real image distribution.To reduce the complexity of image distribution,we propose the distribution normalization module to introduce the variational auto-encoder into the discriminator of GAN.The module effectively helps the discriminator better learn the distribution decision boundary between the generated versus real image latent distributions.In this module,a distribution adversarial Loss is proposed to align the learned distributions in the generator with the distributions normalized in the discriminator.Besides,this paper proposes a semantic disentangling module to extract key information that is conducive to image generation.To better drive semantic disentangling,semantic disentangling loss is proposed.The loss function is based on the constraints of the distribution statistics,to disentangle the key features and non-key features.Experiments demonstrate that the algorithm can effectively improve the quality of the generated image distribution and generate high-quality images.The algorithm achieves high performance in image diversity and semantic consistency.In addition,the distribution normalization module and semantic disentangling module can further effectively improve the performance of other text-to-image generation algorithm.
Keywords/Search Tags:Generative Adversarial Net, Text-to-Image Generation, Semantic Consis-tency, Knowledge Distillation Mechanism, Distribution Normalization Mechanism, Text-Image Attention Mechanism
PDF Full Text Request
Related items