In recent years,film and television animation and virtual reality technology are becoming more and more mature.With the development of three-dimensional human modeling methods,the real three-dimensional human model can be generated in advance through the parameter model trained by in-depth learning,which can replace the modeling method using industrial modeling software or CG technology to solve the problems of low efficiency and huge production cost.At present,the reconstruction of 3D human body mesh from monocular images shows good results,which is mainly divided into two-stage method and direct estimation method.However,there are still some problems in these methods: the small deviation of predicted human body parameters will lead to the obvious misalignment between the estimated grid model and the original figure,and the accuracy of human model reconstruction is low;The results of human model based on video reconstruction often show that the movement is not smooth and the movement of a frame changes suddenly.In order to solve these problems,this paper optimizes from two aspects: improving accuracy and eliminating jitter.This paper uses countermeasure generation network structure and confrontational training through generator and discriminator.The generator predicts human parameters,and the discriminator distinguishes the real human actions from the actions predicted by the generator.Such confrontation training can make the generator produce kinematically reasonable results.This paper uses feature pyramid network to extract multi-scale features to enrich semantic information,and explicitly modifies the prediction parameters based on image grid alignment in depth regressor.In the feedback loop,given the current predicted parameters,the grid alignment information will be extracted from higher resolution features,and the parameters will be corrected iteratively to improve the accuracy.At the same time,by adding the human body parameter constraint loss function,it can judge whether the parameter difference between the two frames exceeds the threshold range.Through iterative optimization,it can effectively limit the changes of pose and shape parameters in the video sequence and improve the jitter and motion mutation of the human body model.In this paper,the gated loop unit is used in the discriminator to effectively learn the time information hidden in the video sequence,which is helpful to improve the continuity and smoothness of the modeling results.Through the analysis of qualitative and quantitative results,this paper verifies the effectiveness of our optimization method on human 3.6m,3dpw,mpiinf-3dhp and other data sets and multiple benchmarks.Ablation experiments are carried out to analyze the importance of mesh alignment features and human parameter constraints. |