Font Size: a A A

Human Pose And Action Analysis Based On Deep Learning

Posted on:2024-09-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y B XiaoFull Text:PDF
GTID:1528306944466494Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
Human pose estimation and action recognition tasks are the fundamentals for computers to detect and understand human activities and always serve as the necessary steps for high-level vision tasks such as human-computer interaction(HCI),virtual reality(VR).Over the past decade,with the rapid development of high-performance graphics processor(GPU)and the collection of a tremendous amount of labeled training data,pose estimation based on deep learning has achieved significant performance improvements compared with the methods using hand-crafted features,and has been applied to many reality scenes.However,there are still shortcomings such as unreasonable structure for convolutional networks,high computation complexity,redundant pipeline,and involving many hand-designed rules.Therefore,existing methods are difficult to deal with complex scenes,and the poor real-time capacity and generalizability limit further development and application.To tackle the above issues,this work is based on deep learning technology,from the perspectives of research and application,respectively.We explore the network design,algorithm pipeline,and the universality of paradigm for human pose estimation and action recognition.The purpose is to build a more robust,efficient,universal pose estimation system for further developments of research and application.The main works are as follows:1.This paper proposes a spatial-preserve and content-aware network for 2D singleperson pose estimation task.To tackle the problem that the existing network hard to deal with the complex scenes,including occlusion,blurring,illumination change,distortion,etc.Firstly,we study the characteristics of the network to cope with the above complex scenarios.Based on these required properties,this paper proposes a new framework that can preserve the spatial resolution along with large receptive field,and select relatively important features from different levels under a sufficient consideration of spatial content-aware mechanism thus considerably improving the performance.Extensive experiments on MPII,LSP,and FLIC human pose estimation benchmarks demonstrate the effectiveness of our network.2.This paper proposes a novel body representation method and further introduces a compact and powerful single-stage multi-person pose estimation network,termed as AdaptivePose,to address the high computation cost and redundant two-stage pipeline in top-down and bottom-up paradigms.This paper proposes to represent the human parts as adaptive points and introduce a fine-grained body representation method.The novel body representation is able to sufficiently encode the diverse pose information and effectively model the relationship between the human instance and corresponding keypoints in a single-forward pass.AdaptivePose is applied for both 2D/3D multi-person pose estimation tasks to verify its effectiveness and generalizability.3.Off-the-shelf single-stage multi-person pose regression methods generally leverage the instance score to indicate the pose quality for selecting the pose candidates.This paper proposes to learn the pose regression quality-aware label representation to calibrate the inconsistency between instance classification score and pose regression quality.We further present to explicitly encode the predicted structural pose information into instance feature to perceive pose quality.The proposed method achieves the state-of-the-art result,which proves the great potential capacity of single-stage multi-person pose regression.4.This paper proposes a sparse end-to-end multi-person pose regression framework,which represents each human instance via sparse learnable part-level queries associated with an instance-level query.The existing methods rely on dense representations to preserve spatial detail and body structure for precise keypoint localization.However,the dense paradigm is hard to model with the skeleton-based action recognition universally and also introduces complex and redundant post-processes.The proposed framework uses sparse part-level query to encode the spatial details and structural information for precise keypoint localization,and avoids hand-crafted post-processes.Furthermore,the sparse representation can unify pose estimation and action recognition in a general framework.
Keywords/Search Tags:Deep learning, Pose estimation, Skeleton-based action recognition, Single-stage, Quality-aware, Sparse representation
PDF Full Text Request
Related items