Human images contain a lot of information and the results obtained from processing and analyzing them can be used in various fields such as video surveillance,smart home and human-computer interaction.Human pose estimation has been a hot topic of research in human image processing tasks,while it can assist in several tasks such as pedestrian re-identification,human image generation and motion human tracking.Therefore,it is necessary to investigate human pose estimation tasks.Human pose estimation in reality is commonly carried out in scenes with scale variation,background confusion and occlusion,and these complex scenarios can lead to mistaken or missed detection of human keypoints.To address these problems,different pose estimation algorithms are proposed in this paper,and the main research work is as follows:(1)In order to address the scale variation challenge faced by pose estimation algorithms,a pose estimation network based on multi-scale position enhancement is proposed in this paper.To address the problem that existing multi-scale pose estimation networks fail to effectively fuse information from different scales,a multi-scale adaptive fusion unit is proposed to explore the information interaction between different scales.To address the semantic discontinuity problem that arises during the fusion of features at different levels using a top-down approach,a position enhancement module is proposed to make the generated human pose more reasonable.Finally,a global context block is introduced to improve the overall network accuracy.Both quantitative and qualitative experimental results demonstrate that the proposed method in this paper not only leads to effective performance improvement,but also in detecting more small-scale keypoints.(2)In order to address the challenge of background confusion in pose estimation algorithms,this paper proposes a global semantic guided pose estimation network.First,a global feature refinement module is designed to refine the features from the top layer of the Res Net network to enhance the representation capability of the features in this layer and retain more detailed information about the background and the target human body.Then,using global features as a guide,a multi-branch semantic aggregation layer is proposed to progressively aggregate high-level features,low-level features and global features to extract more discriminative features for distinguishing the target human body from the background region and to avoid the incorrect localization of target human keypoints to the background region.Both quantitative and qualitative experimental results demonstrate that the proposed method in this paper not only surpasses other algorithms in terms of performance,but also alleviates the effect of background confusion.(3)In order to address the occlusion challenges faced by pose estimation algorithms,a pose estimation network based on graph structure inference is proposed.The network consists of a multi-stage feature fusion network and a graph pose refinement network.The multi-stage feature fusion network aggregates different stage features using the attention feature fusion module to extract rich image context information and output more accurate initial pose coordinates.The graph pose refinement network refines the initial pose,models the human structure information and dynamically adjusts the connection relationship between limbs using modulated graph convolution layer and modulated graph convolution attention block.The experimental results demonstrate that the proposed algorithm not only achieves higher accuracy,but also can infer more invisible keypoints. |