Font Size: a A A

Research On Generative Adversarial Network Based Cross-modal Image Generation

Posted on:2021-01-08Degree:MasterType:Thesis
Country:ChinaCandidate:K YueFull Text:PDF
GTID:2428330614971223Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Image generation,as an important issue in the field of machine learning,has attracted more and more attention with the rise of generative adversarial network.In addition to supplementing some missing data and producing the approximation results of the real data,this generative model can also be used to verify the expression ability of the model for high-dimensional probability distribution problems,deal with multi-mode output problems and reinforcement learning.The conventional single-modal image generation task is achieved with random noise or basic image as input.However,the flexibility of the network obtained in this way is actually limited,which can only manipulate images along one or more fixed patterns.In contrast,the introduction of text information can bring more flexibility to the network.By associating the semantic information with image features,the images can be changed as the given text description changes.And compared with the single-mode image generation,in addition to the network structure design itself,it is also necessary to consider that how to map the data of two different modes to each other in semantic image generation.Currently,the existing models for text-guided image manipulation are inferior in the quality of the generated images,and resolution is low as well.To address this issue,we put forward new solutions in this research.And the main work and contributions are shown as below.First,we present a new progressive cross-modal image generation model.The previous generators of different methods all directly combine the visual features that extracted from original image with the semantic features.And then they feed the acquired features into several residual blocks for transformation.As for the discriminator,they use different loss functions to improve the training effect.However,as the resolution of the image increases,the training of the network will become more difficult as well.Therefore,we consider using progressive method to gradually manipulate the image from low resolution.In addition,we also use our proposed union module to remain more original image details during manipulating.And the experimental results on the fine-grained image dataset in this research can fully prove the effectiveness of our method.Second,due to the lack of text descriptions for fine-grained image dataset,we combine the bilinear attention pooling method to improve the unified cross-modal visual semantic embedding model.In this way,the improved model can not only enhance the mapping relation between fine-grained visual feature and semantic feature,but also reduce the interference from the background image.And the improved model can generate corresponding image according to the semantic information correctly under the condition of small sample to a certain extent as well.In order to verify the effect of our method,we also make a detailed comparison with the original method,and the superiority of the improved method is fully verified in this paper.
Keywords/Search Tags:Generative Adversarial Network, Image Generation, Semantic Image Synthesis, Cross Modal, Visual-semantic Embedding
PDF Full Text Request
Related items