Research And Design Of Text-to-image Synthesis System Based On Cross-modal Correlation

Posted on:2021-08-29

Degree:Master

Type:Thesis

Country:China

Candidate:N Wang

Full Text:PDF

GTID:2518306308969699

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

The task of text-to-image synthesis is to generate photographic images conditioned on given textual descriptions.This challenging task has recently attracted considerable attention from the multimedia community due to its potential applications.Most of the up-to-date approaches are built based on generative adversarial network(GAN)models,and they synthesize images conditioned on the global linguistic representation.However,the sparsity of the global representation results in training difficulties on GANs and a shortage of fine-grained information in the generated images.This dissertation thinks that text-to-image synthesis models should not only use global language representations as to the generative condition,but also take full account of the local language representations.At the same time,computing costs and hardware requirements of text-to-image synthesis models should be within a reasonable range for commercial landing.However,current text-to-image synthesis models often consume a lot of memory resources.This is because most of them stacking multiple GANs to complete the task.Therefore,this dissertation expects to do all the work with one GAN.To address this problem,this dissertation proposes cross-modal global and local linguistic representations based generative adversarial networks(CGL-GAN)by incorporating the local linguistic representation into GANs.CGL-GAN constructs a generator to synthesize the target images and a discriminator to judge whether the generated images conform with the text description.In the discriminator,this dissertation proposes a cross-modal projection algorithm,which constructs construct the cross-modal correlation by projecting the image representations at high and low levels onto the global and local linguistic representations,respectively.Finally,this dissertation designs a loss function based on hinge loss to train CGL-GAN model and evaluates CGL-GAN on two publicly available datasets,the CUB and the MS-COCO.Extensive experiments demonstrate that incorporating fine-grained local linguistic information with cross-modal correlation can greatly improve the performance of text-to-image synthesis models,even when generating high-resolution images.Based on the CGL-GAN model,this dissertation designs a text-to-image synthesis system.This dissertation completes the requirement analysis,general design,detailed design and database design of the text-to-image synthesis system,then introduces the system testing including functional testing and performance testing.The system testing shows that the system has completed all the functional requirements and the load of the server is relatively low.System testing proves that the text-to-image synthesis system designed in this dissertation is complete and has potential commercial value.

Keywords/Search Tags:

GAN, Text-to-image Synthesis, Cross-modal, Deep Learning

PDF Full Text Request

Related items

1	Research On The Method Of Cross-modal Image And Text Retrieval Based On Deep Learning
2	Cross-Modal Retrieval Of Image-Text Based On Deep Learning
3	Research On Cross-modal Semantic Relationship Based Image Synthesis
4	Deep Network For Image-Text Cross-Modal Retrieval
5	Design And Implementation Of DCGAN-based Image-text Cross-modal Retrieval System
6	Unicoder-VL:A Universal Encoder For Vision And Language By Cross-modal Pre-training
7	Research On Hierarchical Supervised Cross-modal Image And Text Retrieval Based On Deep Hashing
8	Cross-Modal Face Recognition Based On Deep Learning
9	Research On Content Sifting And Storage Mechanism Of Cross-modal Image And Text Data Based On Semantic Similarity
10	Image-text Cross-modal Retrieval Based On Deep Hashing Method