Font Size: a A A

Research On Cross-modal Generation From Text To Person Image

Posted on:2021-05-20Degree:MasterType:Thesis
Country:ChinaCandidate:T HuangFull Text:PDF
GTID:2428330602986096Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
The purpose of the text-to-image generation task is to convert the semantic relationship described in the text into semantically related images using image generation technology,which has great application value in text-matching,user portraits,and interactive creation.With the rise and rapid development of generative adversarial networks,researchers have proposed many different method models to continuously improve the quality of generating general images based on text descriptions.However,because person images have various requirements on the person's posture,posture,appearance characteristics,and texture appearance,the research on generating person images based on text description is in terms of image clarity,maintenance or transformation of person poses,maintenance of person personality characteristics,etc.There is still much room for improvement.In response to these problems,this paper designed two new generative models based on the generative adversarial network model and constructed a new data set,as follows:(1)Aiming at the insufficiency of posture maintenance,feature invariance and texture details generated by existing models,this paper proposes a text-to-person image generation model based on spatial structure adaptive normalization.The model outputs images of people from low to high resolution in a multi-stage manner.In the early stage,the model introduces adaptive normalization,which directly maps text information to image pixel information.In the later stage,the original person's features are introduced,and the rough images generated in the previous stage are continuously optimized to refine the texture details.Through experiments on the large public dataset Deep Fashion,the results show that the model proposed in this paper effectively improves the quality of the generated pictures,guarantees the invariance of the person's posture and personality characteristics,and the color and texture details are more delicate.(2)Aiming at the problem that the existing models cannot generate multi-pose person images,this paper proposes a multi-module multi-pose person image generation model based on generative adversarial networks,which combines text-to-person image generation and pose transfer in multiple modes.In this paper,a multi-mode fashion manipulation network composed of four modules is designed,which uses a person's parser map to separate person pose transfer from text to image rendering.By increasing or decreasing the corresponding modules,the model can not only generate a person image in a fixed pose but also specify any pose to generate a person's image.Through experiments on the large public dataset Deep Fashion and the selfbuilt data set MPV-Text,the results show that the model proposed in this paper effectively improves the resolution of the generated person image,the color and details of the generated image are more delicate and the character can be manipulated arbitrarily attitude.(3)Aiming at the problem that the existing text-image data set has a low image resolution and a person image without pairing poses,this paper constructs a multi-pose text-person image dataset,named MPV-Text.In this paper,full experiments were carried out on MPV-Text to verify the feasibility of the model.The construction and release of the dataset are conducive to promoting the research process in related fields.
Keywords/Search Tags:Text to image generation, Generation adversarial network, Person image, Cross-modality
PDF Full Text Request
Related items