Font Size: a A A

Human Motion Generation Via Deep Learning

Posted on:2020-01-03Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y HuangFull Text:PDF
GTID:2428330623463624Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Video generation,especially human motion video generation,has been attracting increasing research attention.Early methods directly apply/extend conventional 2D GAN(i.e.,used to deal with 2D image generation)to generate 3D spatio-temporal video.However,these approaches usually yield low video quality(non-realistic looking)due to the high dimensional searching space.To this end,some recent methods have attempted to constrain the generator with human skeleton information(e.g.,skeleton figures or joint position maps),thus to output more realistic articulated human motions.However,these methods also have significant limitations.First,most of these algorithms require for each frame a corresponding specific skeleton pattern for image synthesis.In other words,a sequence of skeleton representation vectors(or joint positions)should be given in prior for video generation.However,in most cases,it is very hard to get this information,which greatly limits its application.Second,these methods always require pairs of image frames with same background and identical person for supervised training.However,to obtain such strong supervised training data is very expensive,which in turn forbids further scale up of algorithmic training.To explicitly address these issues,this work proposes a new problem setting.Namely,given a single static image(with a human figure inside,denoted by "input person")and a video dataset of a specific human motion of some other persons(e.g.,walking,dancing,denoted by "target motion"),we aim to generate a novel video sequence of the input person acting some similar motion out.Note that the synthesized motion images should not follow exactly the same motion(i.e.,the same joint position movements),and we do allow randomness in terms of the motion.In other words,we shall first sample a proper sequence of(articulated)motion representations from the specific motion representation space of the target action type,and then according to these generated motion representations and the input human figure to further synthesize the full sequence of motion images.It is therefore observed that for each time stamp,simultaneous sampling/generation of a particle in the motion representation space as well as an corresponding image in the appearance space is required,and moreover,the pair of samples in both space should be constrained by each other(i.e.,make both sides compatible).Motivated by this observation,in this work we propose a cross-space human motion video generation network which features two paths: a forward path that first samples/generates a sequence of low dimensional motion vectors based on Gaussian Process(as it is an effective latent space method modeling human motion),which is paired with the input person image to form a moving human figure sequence;and a backward path based on the predicted human images to re-extract the corresponding latent motion representations.As lack of supervision,the reconstructed latent motion representations are expected to be as close as possible to the Gaussian Process sampled ones,thus yielding a cyclic objective function for cross-space(i.e.,motion and appearance)mutual constrained generation.We further propose an alternative sampling/generation algorithm with respect to constraints from both spaces.As a form of self-supervision,the above framework no longer needs pair of ground-truth image and input frame sharing same background and identical person for model training,makes the approach very flexible.Extensive experimental results show that the proposed framework successfully generates novel human motion sequences with reasonable visual quality.
Keywords/Search Tags:Video Generation, Gaussian Process, Generative Adversarial Network
PDF Full Text Request
Related items