Font Size: a A A

Research On Bottom-up Approaches For Multi-Person Pose Estimation

Posted on:2022-04-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:J LiFull Text:PDF
GTID:1488306323965429Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Human-centered pose estimation technologies,such as head pose estimation,hand pose estimation,and human body posture estimation attract much research interest,as they greatly facilitate many computer vision and multimedia tasks.In this paper,we concentrate on the very challenging problem of multi-person pose estimation,which aims to recognize and localize the 2D skeleton keypoints for all persons in a given image.The existing approaches can be divided into two main categories:top-down and bottom-up.Recently,much progress has been made in this field,and some top-down approaches have achieved excellent results.However,most of them have complicated structures and low prediction efficiency.Compared with them,the bottom-up approaches are more challenging but have many obvious advantages.They do not rely on the human detector and are more efficient in the inference of multiple human poses in a given image.In this paper,we review previous work and develop bottom-up approaches for multi-person pose estimation.Moreover,we absorb the inference models and loss functions used in human pose estimation,modify and apply them to the problem of 3D hand pose estimation given a single depth image.The primary research work and contributions of this dissertation are as follows:1.We present a bottom-up approach,which is based on Gaussian response heatmaps,to perform the task of multi-person pose estimation.Firstly,we employ Gaussian response heatmaps to encode the location information of keypoints and the pairing information between keypoints for all individuals in the image.Sec-ondly,we propose two cascaded convolutional neural networks to infer the said heatmaps.The one network is called PoseNet,consisting of multiple residual-inception network modules stacked together,each of which extracts and fuses spatial features at different scales internally.The other one is referred to as Iden-tity Mapping Hourglass Network(IMHN),in which spatial and channel attention mechanisms are introduced to capture the features of skeleton keypoints at differ-ent scales and latent associations between skeletal keypoints.Thirdly,we design a novel focal L2 loss to help the network learn hard samples in the heatmaps.The proposed approach is simple yet comparable to the state of the art on the challenging MSCOCO dataset.2.We propose a simple yet reliable bottom-up approach with a good trade-off be-tween accuracy and speed for the problem of multi-person pose estimation.Given an image,we employ an Hourglass-104 Network,which consists of 104 convo-lutional layers,to indiscriminately infer all the keypoints belonging to different persons as well as the guiding offsets connecting the adjacent keypoints belong-ing to the same persons.Then,we greedily group the candidate keypoints into multiple human poses,on the basis of the guiding offsets.Moreover,we revisit the heatmap-based encoding-decoding methods for the multi-person keypoint co-ordinates and reveal some important facts affecting accuracy.Experiments have demonstrated the obvious performance improvements brought by the introduced components.Our approach obtains competitive results on the MSCOCO dataset.3.As for the problem of 3D hand pose estimation from a single depth image,we bor-rowed ideas from the research on human pose estimation.Concretely,we propose an end-to-end Hourglass Network with local-regression and feature-fusion fash-ion to directly regress the 3D coordinates of the hand keypoints from the depth image block containing the hand.Also,we present a modified L1 loss function for the coordinate regression task.The proposed approach is comparable,or su-perior to the state of the art on the NYU hand pose dataset.And the implemented working system is up to real-time applications.
Keywords/Search Tags:Multi-Person Pose Estimation, Bottom-up, Heatmap, Guiding Offset, Deep Learning, 3D Hand Pose Estimation
PDF Full Text Request
Related items