Font Size: a A A

Research On Text-to-image Based On Deep Learning

Posted on:2021-12-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:T HuFull Text:PDF
GTID:1488306290485474Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Text-to-image,as a key research problem in computer vision,has received great attention from academia and industry.It maps the semantic information of text to corresponding pixel information,and synthesize one or more images that meet the expression of text description.Text-to-image can not only reduce the cost for users to obtain images on the Internet,but also simulate that humans have different imaginations for the same text description.On the other hand,using text-to-image technology to visually reproduce the content of text or dialogue description in real time,and it is used to extract the visual feature of text description.While the visual feature of text can optimize the performance of image visual recognition,so text-to-image can be widely used in computer education,social media entertainment and other fields.The processing of text-to-image has many difficulties.For example,the generative adversarial network(GAN)of text-to-image is more focused on improving the quality of synthetic images,which ignores the diversity of text understanding.And there is visual inconsistent problem between synthetic image and corresponding real image in the existing GAN.There are many difficulties in visual recognition to social media images.For example,the common visual recognition methods of image require specific dataset and do not effectively utilize the text data corresponding to the image.This thesis takes the text-to-image and image visual recognition in social media as the research objects.We focus on the researching of text-to-image based by deep GAN.We first propose a diversity GAN to realize diversity images generation condition on text.And we introduce the attention to diversity GAN as diversity conditional GAN,in which the mode seeking is used to improve the diversity between synthetic images.We also research the category attribute of image to realize visual constraints in the process of text-to-image.We propose the visual representation of text based on diversity conditional GAN to improve the performance of visual recognition of image.Specifically,the main contributions of this thesis are described as follows:(1)A diversity generative adversarial network(DGAN)based on random noises is proposed,which can be used to simultaneous generate of multiple synthetic images with remarkable diversity based on a single text.Traditional generative adversarial networks are insensitive to multiple random noises of the input.Focusing on the”how to generate a batch of realist images with different appearances at the same time”,this thesis proposes a text generation image method based on DGAN.The traditional ”single discriminator-single generator” adversarial model is extended to”single discriminator-multiple generators” in DGAN,and the fusion of text feature with multiple random noises is feed into composite generator for training,which is composed of multiple generators.At the same time,the conditional and nonconditional loss functions of the shared single discriminator and multiple generators are designed to realize the synchronous optimization of diversity generator.(2)A diversity generative conditional adversarial network(DCGAN)based on attention is proposed,which is used to associate text with the regions of synthetic image to improve the quality of synthetic images.To break through the limitation of the assumption that ”images with similar text context tend to describe similar scenes”in tradition GANs,DCGAN is designed based on the attention mechanism and the relationship between the noise vectors and the synthetic images.the proposed model uses the attention to connect the words and regions of K synthetic images,so as to improve the diversity of synthetic images.And the mode seeking is used to improve the diversity of synthetic image,according to calculate the ration of noise feature difference to image feature difference crossing K groups.(3)A category relativistic diverse conditional GAN for text-to-image is proposed,which is used to address the inconsistency of the main visual features between the synthetic image and real image in the traditional GANs.To address the problem of ”there is a very obvious visual difference between the synthetic image and the corresponding real image” in GANs,the proposed model explores the relationship and the category consistency between synthetic and real images.And the relativistic discrimination loss and category-consistent loss are used to improve the quality of synthetic images.The propose model combines the visual features of synthetic and real images,and it use the softmax layer and cross-entropy to estimate the probability category of combined feature.The proposed model uses the category-consistent loss to realize the consistency of visual features between synthetic and real images in the global visual space.(4)Developed a visual recognition based on text-to-image,and the visual representation of text is proposed to improve the performance of image classification and semantic recognition.A common visual feature description of K synthetic images is extracted firstly.This visual feature is considered as a text visual interpretation in the visual feature space.And the proposed model combines it with feature of real image and feature of text,which effectively improves the performance of real image visual recognition.This thesis validates the proposed generative adversarial networks in caltech-ucsd birds-200-2011 dataset and Oxford 102 flower dataset.And we compare the proposed networks with the baselines on the quality of synthetic image,the diversity between synthetic images.The experimental results indicate that the proposed text-to-image networks improve the quality and diversity of synthetic images effectively.And we also conduct a visual experiment of conditional generative adversarial networks based on attention on COCO dataset.This thesis also validates the propose visual recognition model based on text-to-image in caltech-ucsd birds-200-2011 dataset,Oxford 102 flower dataset and MS COCO dataset.The experimental results prove that the proposed visual representation of text based on text-to-image can effectively improve the performance of image visual recognition.
Keywords/Search Tags:Generative adversarial networks, text-to-images, text visual representation, visual recognition
PDF Full Text Request
Related items