Font Size: a A A

Research On Cross-modal Image Modification Method Based On Generative Adversarial Network

Posted on:2022-03-16Degree:MasterType:Thesis
Country:ChinaCandidate:Q Q NieFull Text:PDF
GTID:2518306563464804Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Image generation has always been one of the key research areas of artificial intelligence.Since 2014,due to the emergence of generative adversarial networks,the field has entered a new stage of development.Image generation has a wide range of applications.It can be used to generate realistic data,fill in missing data,and can also be used for reinforcement learning and solving multi-mode output problems.The input of the traditional single-modal image generation model is generally the original image or random noise.This type of model generally can only perform image conversion between two or more domains.The model generation ability is greatly restricted and the flexibility is insufficient.In contrast,the cross-modal image generation model based on text description has great flexibility due to the introduction of text information.This paper is mainly devoted to the related research of cross-modal image modification model,which is a sub-field of the research field of cross-modal image generation.This type of model currently has the following three problems.First,the existing cross-modal image modification model is mainly researched on the flower and bird data set,and its practical application value is not high;second,because the text and image are data of different modalities,so It is necessary to consider how to accurately map the two to each other;finally,problems such as low image generation quality and blurry details are common in this type of model.This article proposes a new solution to the above-mentioned problems.The main work and contributions of this paper are as follows:Firstly,manually label the text description for the VeRi776 traffic data set.Currently,there are very few data sets available for cross-modal image modification.Most of the related work is done on the CUB-200-2011 Birds and Oxford 102 flowers data sets,which is more than interesting,but the actual application value is not high.In view of the abovementioned reasons,this article first chooses to manually label the VeRi776 traffic data set during the work,which is widely used in the research of intelligent transportation field.In this paper,the results of cross-modal image modification work on this data set can be directly applied to various downstream tasks in the field of intelligent transportation,such as model recognition,vehicle classification,vehicle tracking,and vehicle re-identification.Secondly,a cross-modal image modification model based on generative adversarial network is proposed.Aiming at the problems of few text descriptions and low image quality corresponding to the images of the cross-modal traffic data set we have annotated,this paper designs a cross-modal image modification model.In the generator of the image modification model,this paper uses a two-stage generative model to better capture the original image details.At the same time,in view of the problem of too few text descriptions,this paper adopts a text adaptive discriminator,so that the model can efficiently obtain the fine-grained correspondence between text and image.We conducted sufficient experiments on the self-labeled cross-modal traffic image data set,and verified the effectiveness of the model in this paper from the perspectives of subjective visual evaluation and objective quantification.Thirdly,a cross-modal image modification model incorporating attention mechanism is proposed.We add a channel-spatial attention network and a self-attention layer to the image feature extraction module of the cross-modal image modification model,which enhances the mapping relationship between fine-grained image features and text features,so that the model can understand the global and local features of the image.The feature extraction effect is more ideal.We made a detailed comparison of the images generated by the model before and after adding the attention mechanism from the perspectives of subjective visual evaluation and objective quantification,and fully verified the effectiveness of the method of fusing the attention mechanism.
Keywords/Search Tags:Image Modification, Generative Adversarial Network, Cross Modal, Visual-semantic Embedding
PDF Full Text Request
Related items