| The aim of human pose estimation is to detect and localise all keypoints of the human body in a given 2D image or video and connect them based on a priori knowledge of the body structure,so that a basic representation of the human pose can be obtained.As one of the most commonly used backbone network models for human pose estimation tasks,high-resolution network is able to extract keypoint features of the human body containing multi-scale information from images,but there are still some difficulties and challenges in its practical application.Firstly,its multi-stage and multi-branch network structure makes it have high computational complexity,thus leading to limitations on its performance and efficiency in practical applications.Secondly,as it can only extract feature information within local ranges of the image,it fails to effectively capture the long-range spatial dependencies between keypoints of the human body,and thus suffers from severe performance degradation when applied to real-life scenes containing crowded people,being vulnerable to the presence of large amounts of occlusion and mutual interference between human bodies.To address these difficulties and challenges,this paper conducts an in-depth study of human pose estimation method based on high-resolution network,and the specific work is as follows:(1)A human pose estimation method based on Dynamic lightweight High-Resolution Network(Dite-HRNet)is proposed,with lightweight improvements to the high-resolution network,effectively improving the performance and computational efficiency of the human pose estimation model in practical applications.Specifically,a new convolution operation named dynamic split convolution and an adaptive context modeling method are proposed,followed by two lightweight network building blocks specifically designed for high-resolution network structures,which are then used to construct the Dite-HRNet with tens of times lower number of parameters and computational costs than the original high-resolution network,while still maintaining high accuracy in human pose estimation.(2)A crowd pose estimation method based on High-Resolution co Nte Xt network(HRNe Xt)is proposed to optimize and improve the high-resolution network for the heavy occlusion problem in crowded crowd scenes,can better understand the spatial contextual information in the image and the human body occlusion relationship,and enable more accurate estimation of the occluded human pose,thus improving the accuracy and robustness of human pose estimation algorithm in practical application scenarios.Specifically,two feed-forward network unit structures designed for vision tasks are proposed and used to construct the HRNe Xt,then it is tested on three public human pose estimation datasets,and the higher the crowding level of the dataset,the more significant the accuracy and performance advantages are,demonstrating its effectiveness in optimizing for the heavy occlusion problem.(3)Considering the demand for real-time detection in the practical application of human pose estimation,the Dite-HRNet and HRNe Xt models proposed in this paper are applied to the standard top-down human pose estimation algorithm process and deployed in a computer terminal equipped with a monocular RGB camera device and a Windows operating system,thus implementing a real-time multi-person pose estimation system based on a monocular RGB camera,which can read the RGB image frames captured by the camera and perform simultaneous multi-person pose estimation in real time by using an external monocular RGB camera device on the Windows platform. |