Font Size: a A A

Research On Person Image Generation And Video Synthesis Technology Of Arbitrary Human Pose Based On Generative Adversarial Networks

Posted on:2022-08-01Degree:MasterType:Thesis
Country:ChinaCandidate:H Y WangFull Text:PDF
GTID:2518306488992609Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Video synthesis via human pose has become a research hotspot in the field of computer vision in recent years.It is based on the continuous interframe images of human key-points as semantic labels to generate videos of the same person in different pose.At the same time,the appearance and texture of the target person in the video are consistent with those in the source video.In recent years,research shows that human pose synthesis has a wide range of application prospects,such as dance short video production,movie special effects synthesis,video editing,motion imitation,appearance conversion,character animation and video data set expansion.Although the existing human pose synthesis methods can synthesize the contour of the target person according to the key points of the human pose,and realize the conversion of the person's pose,the current methods usually have poor retention in the details of the person and the background texture of the frame image,especially the facial expression of the person.In addition,the traditional video interframe generation methods generally exist jitter and the playing content is not smooth.Therefore,in order to solve the above problems,this paper proposes an interframe image generation method based on generative adversarial network to improve the character details and the coherence between video frames.The main contents and innovations of this work are as follows:1.In order to transfer the action sequence of the character in the source video to the target character,a human posture transformation network is proposed.The conditional generative adversarial network can synthesize the character image under the guidance of the target posture and overcome the loss of appearance and texture details.The first generator based on the principle of cross domain communication takes the skeleton image of human pose extracted by the human pose detector and the continuous frame of the target person video as input to synthesize the rough image of the person with conditional pose.Another generator is used for local details enhancement,which combines residual information and constructs perceptual reconstruction loss to constrain the difference between the generated image and the target image,so as to improve the quality of consecutive images between the person frames needed for synthetic video.2.In the field of image to video generation,a method based on generative adversarial network is proposed to improve the spatial continuity of frame image.Specifically,a continuous human action sequence generated by the human pose transformation network and the corresponding human skeleton reference image are used as inputs to generate the motion compensation frame of the interframe sequence,and then the complete human action sequence including the motion compensation frame is input into the discriminator to enhance the spatial continuity of the adjacent frame images of the video,so that the generated video is more in line with human visual perception.3.In order to avoid the unsatisfactory results of the model output caused by the different body proportions of the persons in different videos,a global posture normalization network is proposed to match the spatial positions of the skeleton images of the source person and the target person.In different videos,a linear mapping is established between the maximum and minimum positions of the characters in the picture to translate the key points of the persons' pose,so that the constructed human skeleton image contains not only the action information of the source person,but also the proportion of the target person's body and limbs.
Keywords/Search Tags:Generative adversarial networks, Pose-guided person image generation, interframe image generation, person video synthesis
PDF Full Text Request
Related items