| Over the past few years,computer vision applications and image processing techniques have benefited from the advancements greatly which resulted from some breakthroughs in deep learning.With the rising of hardware level,the direction of image generation has developed rapidly and basing on it,people set their sights on how to build a system which could understand the relationship between vision and language,on the other hand,create images that reflect what meaning texts describe.In recent years,although researches on image generation methods from text creates progress,there are still challenges on quality and semantic consistency.Based on generative adversarial network,this thesis studies the method of generating images from text.The main work of this thesis is as follows:(1)People research on the network structure and implementation method of text-toimages generation methods in recent years.Based on Attn GAN,existing methods of low quality and mismatching the semantics of the given text in generated images get improvements accordingly.Driving the generator by word-level spatial and channel attention,allowing to generate synthesize sub-regions corresponding to the most relevant words by leveraging an attention mechanism and progressively improve the quality of the generated images through a multi-stage architectural model.By introducing the method of image description,the content of the generated image is explained again,and descriptive text is generated.Regenerating the content of images by the introduction of imagine description methods and produce descriptive text urges generator to generate an image with the same semantics as the given text by comparing the original given text semantics with the reconstructed text semantics.(2)Improving the text generation image method in order to solve the problem that the existing methods rely too much on the initial image when generating images,at the same time,introducing a key-value based dynamic memory network.First,using a generative adversarial network to generate an initial image,then refining the initial image gradually by a generative network improved based on a dynamic memory model.Using the memory write gate and response gate to select relevant words according to the initial generated image,solving the problem that the weight of text words in different images which is consistent in the attention mechanism,and refine the image more targetedly.Through the experimental verification on the public dataset Birds-200-2011 and Microsoft COCO,it is proved that the performance of the method in this thesis in terms of the quality and semantic consistency of the generated images has been improved to varying degrees. |