Font Size: a A A

Research On Semantic Image Generation Model Based On Generative Adversarial Network

Posted on:2022-02-24Degree:MasterType:Thesis
Country:ChinaCandidate:S S ZhangFull Text:PDF
GTID:2518306326471584Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years,image generation has become a research focus in the field of computer vision.The image generation models in deep learning can not only automatically generate images for users,which is helpful for visual understanding,but also promote cross-modal learning and reasoning.Moreover,it is of great significance to the development of visual computing,image and language processing,and humancomputer interaction.Existing generative models have realized the synthesis of simple individuals and low-resolution complex images,but it is still challenging to directly generate high-quality semantic images containing multiple entities and reasonable layouts from complex text or structured scene descriptions.To maintain the semantic consistency of the input text descriptions and the generated images,improve the quality of the images generated by the structured scene graphs,we enhance the fine-grained texture information to ensure the interaction accuracy of different objects.Conditioned on the generative adversarial networks,this thesis constructs and analyzes image generation models.The main contributions are listed as follows:(1)Aiming at the problem of fuzzy instance features and insufficient visual attributes in the process of generating images from complex text descriptions,an instance mask embedding and attribute-adaptive generative adversarial network(IMEAA-GAN)is proposed for text-to-image synthesis.Firstly,to overcome the complexity and ambiguity of a whole sentence,we explicitly utilize the word-level embedding as input and use the box regression network to obtain the global layout that contains spatial positions,object sizes,and class labels.Then the global generator encodes the layout,combines the whole text embedding and noise to preliminarily generate a low-resolution image.In order to make local refinement generators learn instance-level and fine-grained features,we propose the instance mask embedding mechanism to add pixel-level mask constraints.Finally,two word-level and attributeadaptive discriminators instead of commonly used sentence-conditional discriminators are employed to classify each attribute independently and generate exact signals for generators to synthesize certain visual attributes.Extensive experimental results and analysis show that the model has the capabilities of obtaining globally consistent attributes and synthesizing complex images with local texture details.(2)For the purpose of solving the problem that it is difficult to generate diverse instances and high-quality complex scene layouts from structured scene graphs,the background and foreground disentangled generative adversarial network(BFD-GAN)is introduced.To begin with,to avoid the ambiguity of the text descriptions,we specifically convert them into scene graphs,from which we further utilize the graph convolutional network to infer semantic backgrounds.The foregrounds and backgrounds are disentangled and separately generated to improve the image quality and reduce the generation complexity.Afterward,with the aim of avoiding laborintensive manual labeling,the foreground parsing module is put forward.It explicitly calculates fine-grained foregrounds with recognizable geometric appearances in an unsupervised manner.Finally,the foreground-background integrating module is exploited to adaptively optimize and refine the visual features.The foreground-relation aware attention is presented in this module to compute foreground-pair interaction biases.Besides,max pooling is also performed in order to handle pixel overlapping,thus the most relevant context-level pixels are selected for feature representation.By so doing,the foreground and background are compatible with each other,which is beneficial to the generation of reasonable and high-fidelity complex images.We benchmark our model against existing methods and show that our model is more capable of generating complex backgrounds and corresponding sharp foregrounds with given scene structures.
Keywords/Search Tags:Generative adversarial network, Semantic image synthesis, Instance mask embedding mechanism, Foreground parsing module, Foreground-background integrating module
PDF Full Text Request
Related items