Font Size: a A A

Research And Design Of Text-to-image Synthesis System Based On Cross-modal Correlation

Posted on:2021-08-29Degree:MasterType:Thesis
Country:ChinaCandidate:N WangFull Text:PDF
GTID:2518306308969699Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The task of text-to-image synthesis is to generate photographic images conditioned on given textual descriptions.This challenging task has recently attracted considerable attention from the multimedia community due to its potential applications.Most of the up-to-date approaches are built based on generative adversarial network(GAN)models,and they synthesize images conditioned on the global linguistic representation.However,the sparsity of the global representation results in training difficulties on GANs and a shortage of fine-grained information in the generated images.This dissertation thinks that text-to-image synthesis models should not only use global language representations as to the generative condition,but also take full account of the local language representations.At the same time,computing costs and hardware requirements of text-to-image synthesis models should be within a reasonable range for commercial landing.However,current text-to-image synthesis models often consume a lot of memory resources.This is because most of them stacking multiple GANs to complete the task.Therefore,this dissertation expects to do all the work with one GAN.To address this problem,this dissertation proposes cross-modal global and local linguistic representations based generative adversarial networks(CGL-GAN)by incorporating the local linguistic representation into GANs.CGL-GAN constructs a generator to synthesize the target images and a discriminator to judge whether the generated images conform with the text description.In the discriminator,this dissertation proposes a cross-modal projection algorithm,which constructs construct the cross-modal correlation by projecting the image representations at high and low levels onto the global and local linguistic representations,respectively.Finally,this dissertation designs a loss function based on hinge loss to train CGL-GAN model and evaluates CGL-GAN on two publicly available datasets,the CUB and the MS-COCO.Extensive experiments demonstrate that incorporating fine-grained local linguistic information with cross-modal correlation can greatly improve the performance of text-to-image synthesis models,even when generating high-resolution images.Based on the CGL-GAN model,this dissertation designs a text-to-image synthesis system.This dissertation completes the requirement analysis,general design,detailed design and database design of the text-to-image synthesis system,then introduces the system testing including functional testing and performance testing.The system testing shows that the system has completed all the functional requirements and the load of the server is relatively low.System testing proves that the text-to-image synthesis system designed in this dissertation is complete and has potential commercial value.
Keywords/Search Tags:GAN, Text-to-image Synthesis, Cross-modal, Deep Learning
PDF Full Text Request
Related items