Research On Cross-modal Image Modification Method Based On Generative Adversarial Network

Posted on:2022-03-16

Degree:Master

Type:Thesis

Country:China

Candidate:Q Q Nie

Full Text:PDF

GTID:2518306563464804

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Image generation has always been one of the key research areas of artificial intelligence.Since 2014,due to the emergence of generative adversarial networks,the field has entered a new stage of development.Image generation has a wide range of applications.It can be used to generate realistic data,fill in missing data,and can also be used for reinforcement learning and solving multi-mode output problems.The input of the traditional single-modal image generation model is generally the original image or random noise.This type of model generally can only perform image conversion between two or more domains.The model generation ability is greatly restricted and the flexibility is insufficient.In contrast,the cross-modal image generation model based on text description has great flexibility due to the introduction of text information.This paper is mainly devoted to the related research of cross-modal image modification model,which is a sub-field of the research field of cross-modal image generation.This type of model currently has the following three problems.First,the existing cross-modal image modification model is mainly researched on the flower and bird data set,and its practical application value is not high;second,because the text and image are data of different modalities,so It is necessary to consider how to accurately map the two to each other;finally,problems such as low image generation quality and blurry details are common in this type of model.This article proposes a new solution to the above-mentioned problems.The main work and contributions of this paper are as follows:Firstly,manually label the text description for the VeRi776 traffic data set.Currently,there are very few data sets available for cross-modal image modification.Most of the related work is done on the CUB-200-2011 Birds and Oxford 102 flowers data sets,which is more than interesting,but the actual application value is not high.In view of the abovementioned reasons,this article first chooses to manually label the VeRi776 traffic data set during the work,which is widely used in the research of intelligent transportation field.In this paper,the results of cross-modal image modification work on this data set can be directly applied to various downstream tasks in the field of intelligent transportation,such as model recognition,vehicle classification,vehicle tracking,and vehicle re-identification.Secondly,a cross-modal image modification model based on generative adversarial network is proposed.Aiming at the problems of few text descriptions and low image quality corresponding to the images of the cross-modal traffic data set we have annotated,this paper designs a cross-modal image modification model.In the generator of the image modification model,this paper uses a two-stage generative model to better capture the original image details.At the same time,in view of the problem of too few text descriptions,this paper adopts a text adaptive discriminator,so that the model can efficiently obtain the fine-grained correspondence between text and image.We conducted sufficient experiments on the self-labeled cross-modal traffic image data set,and verified the effectiveness of the model in this paper from the perspectives of subjective visual evaluation and objective quantification.Thirdly,a cross-modal image modification model incorporating attention mechanism is proposed.We add a channel-spatial attention network and a self-attention layer to the image feature extraction module of the cross-modal image modification model,which enhances the mapping relationship between fine-grained image features and text features,so that the model can understand the global and local features of the image.The feature extraction effect is more ideal.We made a detailed comparison of the images generated by the model before and after adding the attention mechanism from the perspectives of subjective visual evaluation and objective quantification,and fully verified the effectiveness of the method of fusing the attention mechanism.

Keywords/Search Tags:

Image Modification, Generative Adversarial Network, Cross Modal, Visual-semantic Embedding

PDF Full Text Request

Related items

1	Research On Generative Adversarial Network Based Cross-modal Image Generation
2	Research On Text To Image Technology Based On Generative Adversarial Networks
3	Reserach On Recommendation System Based On Cross-modal Semantic Mining And Generative Adversarial Networks
4	Research On Image Modification Method Based On Generative Adversarial Network
5	Research On Cross-modal Image Generation Based On Generative Adversarial Network
6	Research Of The Cross-domain Image Understanding Based On Generative Adversarial Neural Networks
7	Research And Implementation Of Vision-touch Cross-modal Algorithm Based On GAN
8	Research On Semantic Image Generation Model Based On Generative Adversarial Network
9	Design And Implementation Of DCGAN-based Image-text Cross-modal Retrieval System
10	Research On Cross-modal Retrieval Method Based On Generative Adversarial Mechanism