Font Size: a A A

Text To Image Generation Based On Generative Adversarial Nets

Posted on:2020-11-01Degree:MasterType:Thesis
Country:ChinaCandidate:K L XuFull Text:PDF
GTID:2428330599459715Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Text-to-image generation is an important research direction in the field of image generation.The purpose is to restore the semantic relations described in the text,and use image generation technology to restore and generate semantic-related images.With the rise and rapid development of the Generative Adversarial Networks,text to image generation technology has been proposed and continuously broken,and it has become one of the research hotspots in the field of machine vision and artificial intelligence.The performance of text-to-image generation depends to a large extent on the training of generating generators and discriminators against the network.In order to solve the problem of insufficient diversity of generated samples caused by the ubiquitous mode collapse of the network,based on the stacked text-to-image generation model,the conditions of mutual information and Pearson correlation coefficient are introduced to constrain the generator and discriminator.A text-to-image generation method that combines mutual information maximization and a text-to-image generation method combined with Pearson reconstruction.The main research results are as follows:1.Aiming at the problem of insufficient diversity of sample distribution caused by text-to-image generation model,a stacked text-to-image generative adversarial networks combining local-global mutual information maximization is proposed.Firstly,the global vector is decoupled from the generated model to obtain different scale feature maps.Then,by maximizing the mutual information between the feature map and the global vector,the correlation between the global feature and the text description is enhanced.Finally,the feature map is extracted as The local position feature vector enhances the correlation between the local position feature and the text description by maximizing the average mutual information between the local position feature vector and the global vector,and obtains a closer text-to-image mapping relationship.The experimental analysis and results show that the method can effectively improve the diversity and semantic precision of the generated samples,and the generated samples are closer to the natural images..2.A text-to-image generation model combining maximum Pearson correlation coefficients is proposed to solve the problem that discriminator converges too fast to provide gradient for generator in the Generative Adversarial Networks.The problem leads to the generation of sample diversity and image quality is difficult to improve.The model improves the discriminator,enabling the discriminator to simultaneously discriminate andencode,providing the model with an inference model while limiting the discriminating ability of the discriminator,enhancing the generalization performance of the discriminator,and making the discriminator and generator training easier.Achieve Nash balance.In addition,in order to enhance the consistency of multi-scale image coding,a multi-scale joint loss is proposed,so that each scale feature vector takes the input combination vector as the reconstruction target and establishes a consistent reconstruction target.Through experimental analysis and theoretical proof,the method effectively improves the diversity and representation quality of the generated samples.In addition,the interpolation experiments show that the reconstruction can generate an image with the overall contour and style consistent with the generated samples,thus indicating that the improved encoder of the discriminator can Generate a valid feature vector.
Keywords/Search Tags:Image generation, Generative Adversarial Networks, Maximizing mutual information, Pearson correlation coefficient, Stacked
PDF Full Text Request
Related items