Font Size: a A A

3D Human Pose Estimation Based On Monocular Image

Posted on:2021-05-04Degree:MasterType:Thesis
Country:ChinaCandidate:X S HeFull Text:PDF
GTID:2428330629480073Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The estimation of 3d human poses from monocular images is a hot research subject in recent years.The research results can be applied to many fields such as intelligent humancomputer interaction,intelligent video surveillance,virtual reality,and image retrieval.However,there are many difficulties in predicting the 3d spatial structure of human body from monocular images,such as occlusion,lack of depth information and ambiguity of human posture.We predict the 3d human pose of monocular images based on a two-stage method.First,an advanced 2d joint detector is used to estimate the 2d pose of the image,and then took the 2d joint coordinates as input data to return to the 3d pose.According to the different data types,the algorithm process of predicting 3d pose from single image and sequence frame image is explored respectively.The main work is summarized as follows:(1)3D human posture estimation by grouping regression.Combining with the characteristics of the independence of human motion,the baseline of the single picture 3d human posture estimation algorithm is improved,and the grouping regression network is proposed to improve the utilization of the input data,strengthen the connection between the strong correlation points in the human body,and avoid the wrong influence between the weak correlation points.Firstly,the main joints of human body that need to be predicted are divided into five parts: left hand,right hand,left leg,right leg and main body.Then the network with the same structure is used to return five groups of joint nodes.Finally,the predicted results are synthesized into 3d poses.In order to make the predicted results more consistent with the normal posture,the output of group regression also needs to be adjusted to adapt to the characteristics of human motion inertia through a self-constrained grammar network.Experimental results in some public datasets show that the method of grouping regression combined with self-constrained network is superior to some advanced methods.(2)3D human pose estimation based on temporal convolution.In order to avoid the difficulty of prediction caused by joint occlusion in single picture,the 2d joint motion trajectory in video sequence is selected as the input data of the 3d human pose prediction model.Referring to the design concept of 2d joint detection network,the intermediate supervision structure is added to the original temporal convolution model,and the slicing function in the original network is replaced by pooling function.In the intermediate supervision structure,each time sequence convolution module outputs a 3d posture to participate in the calculation of the loss function of the network.And the pooling function guarantees the characteristic integrity of the additional data in the residual connection structure.In addition,aiming at the influence of noise in 2d joint trajectory on the predicted results,a 2d joint filter is designed to locate and correct the obvious noise.The results in the common 3D human pose dataset show that the improved model has higher accuracy and excellent anti-noise ability.
Keywords/Search Tags:3D human pose, Two-stage, Grouping regression, Temporal convolution, Selfconstrained network, 2D joint filter
PDF Full Text Request
Related items