Research On Text To Image Technology Based On Generative Adversarial Networks

Posted on:2020-08-22

Degree:Master

Type:Thesis

Country:China

Candidate:X J Chen

Full Text:PDF

GTID:2428330590963042

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In the era of big data,the way to get image information is mainly to search in existing images.But due to the complexity of the content of images,it is difficult to find the information that is really needed.In order to the computer automatically generates meaningful images according to requirements,the task of text to image has attracted people's attention.Text to image is a cross-modal task.When the input is a sentence or a piece of text,the image conforming to the semantic information of the text is output.This task not only requires the computer to understand the semantic information of the text,but also converts the semantic information into pixels.It's a very challenging task.However,with the rapid development of deep learning,especially the generative adversarial networks,many methods have emerged to improve the quality of text to images.However,due to the complexity of this task,the quality of the generated image needs further improvement.In order to improve the quality of text to images further,the works of this paper are as follows:(1)The model of text to image combined with category and reconstruction information.In the task of text to image,in order to improve the quality of the generated images,this paper proposes the generative adversarial networks models combined category and reconstruction information.The model is trained in two stages.We can generate low-resolution images in the first stage,then,we synthesize high-resolution images in the second stage.In each stage of the text to image,category information is added at the end of the discriminator,and the reconstruction information of the pixels and features is added to the loss of the generator to assist the training of model.Experimented on the Oxford Flowers?Caltech UCSD Birds and MS COCO datasets,the experiments show that the model proposed effectively improves the quality of the text to image.The color and details of the images generated are more delicate.(2)The model of complex scene text to image combined with visual dialog.In the task of text to image in complex scenes,the text description can't contain most of the details of the image,and it is difficult to generate high-quality images.Therefore,the paper proposes a model to improve the quality of text to image in complex scenes.In order to pay attention to the correlation between visual dialog and the local area of the corresponding images,the visual-semantic embedding model based on attention is constructed to obtain the feature representation of visual dialog.Then,the feature is spliced with the text description feature obtained by the description encoder,and the image corresponding to the text is generated by the generative adversarial networks models combined visual-semantic embedding and category reconstruction.Experiments were conducted on the MS COCO dataset,the experimental results show that the images generated after adding the visual dialog are more clear,the colors and details are more abundant,which effectively improves the quality of the text to image.

Keywords/Search Tags:

Text to image, Generative adversarial networks, Visual Semantic Embedding, Attention mechanism, Cross-modal

PDF Full Text Request

Related items

1	Research On Image-Text Cross-Modal Matching Based On Attention Mechanism
2	Research On Generative Adversarial Network Based Cross-modal Image Generation
3	Research On Cross-modal Image Modification Method Based On Generative Adversarial Network
4	Research On Text Description Image Generation Based On Generative Adversarial Network
5	Research On Cross-modal Retrieval Method Based On Generative Adversarial Mechanism
6	Research On Text To Image Generation Algorithm Based On Attention Mechanism And Generative Adversarial Networks
7	Reserach On Recommendation System Based On Cross-modal Semantic Mining And Generative Adversarial Networks
8	Image Semantic Segmentation Based On Generative Adversarial Networks And Self-Attention Mechanism
9	Design And Implementation Of DCGAN-based Image-text Cross-modal Retrieval System
10	Research On Semantic Consistency In Text-to-Image Generation