Research On Cross-modal Generation From Text To Person Image

Posted on:2021-05-20

Degree:Master

Type:Thesis

Country:China

Candidate:T Huang

Full Text:PDF

GTID:2428330602986096

Subject:Control Science and Engineering

Abstract/Summary:

PDF Full Text Request

The purpose of the text-to-image generation task is to convert the semantic relationship described in the text into semantically related images using image generation technology,which has great application value in text-matching,user portraits,and interactive creation.With the rise and rapid development of generative adversarial networks,researchers have proposed many different method models to continuously improve the quality of generating general images based on text descriptions.However,because person images have various requirements on the person's posture,posture,appearance characteristics,and texture appearance,the research on generating person images based on text description is in terms of image clarity,maintenance or transformation of person poses,maintenance of person personality characteristics,etc.There is still much room for improvement.In response to these problems,this paper designed two new generative models based on the generative adversarial network model and constructed a new data set,as follows:(1)Aiming at the insufficiency of posture maintenance,feature invariance and texture details generated by existing models,this paper proposes a text-to-person image generation model based on spatial structure adaptive normalization.The model outputs images of people from low to high resolution in a multi-stage manner.In the early stage,the model introduces adaptive normalization,which directly maps text information to image pixel information.In the later stage,the original person's features are introduced,and the rough images generated in the previous stage are continuously optimized to refine the texture details.Through experiments on the large public dataset Deep Fashion,the results show that the model proposed in this paper effectively improves the quality of the generated pictures,guarantees the invariance of the person's posture and personality characteristics,and the color and texture details are more delicate.(2)Aiming at the problem that the existing models cannot generate multi-pose person images,this paper proposes a multi-module multi-pose person image generation model based on generative adversarial networks,which combines text-to-person image generation and pose transfer in multiple modes.In this paper,a multi-mode fashion manipulation network composed of four modules is designed,which uses a person's parser map to separate person pose transfer from text to image rendering.By increasing or decreasing the corresponding modules,the model can not only generate a person image in a fixed pose but also specify any pose to generate a person's image.Through experiments on the large public dataset Deep Fashion and the selfbuilt data set MPV-Text,the results show that the model proposed in this paper effectively improves the resolution of the generated person image,the color and details of the generated image are more delicate and the character can be manipulated arbitrarily attitude.(3)Aiming at the problem that the existing text-image data set has a low image resolution and a person image without pairing poses,this paper constructs a multi-pose text-person image dataset,named MPV-Text.In this paper,full experiments were carried out on MPV-Text to verify the feasibility of the model.The construction and release of the dataset are conducive to promoting the research process in related fields.

Keywords/Search Tags:

Text to image generation, Generation adversarial network, Person image, Cross-modality

PDF Full Text Request

Related items

1	Researcb On Higb-Qnabty Arbitrary Humau Posc Image And Video Generation
2	Research On Text Description Image Generation Based On Generative Adversarial Network
3	Conditional Image Generation Method Based On Generative Adversarial Network
4	Research On Person Image Generation And Video Synthesis Technology Of Arbitrary Human Pose Based On Generative Adversarial Networks
5	Research On Cross-modal Image Generation Based On Generative Adversarial Network
6	Research On Text-to-Image Generation Based On Generative Adversarial Network
7	Research On Cross-modality Person Re-identification Method For Visible And Infrared
8	RGB-to-NIR Image Translation Using Generative Adversarial Network
9	Research On Multi-modality Person Re-identification Algorithm
10	Research On Text To Image Generation Based On Deep Neural Network