Font Size: a A A

Research On Human Pose Estimation Algorithms Based On Deep Learning

Posted on:2024-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:X B JiaFull Text:PDF
GTID:2568306941493454Subject:Electronic information
Abstract/Summary:PDF Full Text Request
With the development of deep learning,significant breakthroughs and innovations have been made in the field of computer vision.After gradually solving the problem of object classification and detection,how to recognize human behavior actions and achieve image analysis related to humans has become an important part of advancing towards advanced computer vision.As the foundation of this research,human pose estimation has broad application prospects and has received widespread attention.However,due to the diversity of joint pose,most of the traditional regression methods are difficult to accurately predict the position of keypoints of the human body.The current good heatmap models are difficult to deploy in edge devices.Other convolutional networks design also have the disadvantages of non-end-to-end,long training time,etc.A good model structure still needs to be explored.Therefore,this article studies human pose estimation algorithms for different needs based on detection accuracy and operational efficiency.This article first considers pose estimation as an unordered set prediction task and adopts the idea of object detection to solve it.Based on this viewpoint,this article proposes a dual branch multi-scale framework composed of convolutional neural networks and Transformers,called TPNet.TPNet integrates multi-level feature maps obtained from the convolutional backbone to improve feature reusability and enrich multi-scale information of the network.In addition,the model assists in training by adding heatmap branches to supervise the generation of intermediate feature maps.Heatmap branches can be discarded during inference without any additional runtime.Finally,the anchor point is used in networks to accelerate training convergence speed and improve model prediction accuracy.TPNet implemented 71.1AP and 87.4PCKh using ResNet as the backbone on the COCO and MPII datasets,respectively.The experimental results showed that this method significantly enhanced the accuracy of pose regression without increasing the amount of extra computation,and demonstrated good results.The existing large-capacity models can usually obtain accurate detection results,but their deployment in practical application scenarios faces many difficulties.To address this issue,this paper introduces a re-parameterized network structure and combines residual likelihood estimation ideas to design a lightweight pose regression algorithm,named RepNet.This method utilizes a carefully designed convolutional architecture for training,simplifies the network model by reconstructing parameters at all levels,and optimizes the inference time and operational efficiency of detection tasks.At the same time,the output data is modeled based on the maximum likelihood estimation,and the reversible transformation of the underlying distribution is learned by the flow generative model to effectively improve the prediction performance.RepNet achieved a detection accuracy of 66.1AP on the COCO dataset with a reasoning speed of 15 ms on the GPU and 40 ms on the CPU,solving the contradiction between model accuracy and computational complexity,and making a modest contribution to the research of lightweight pose estimation.
Keywords/Search Tags:Human Pose Estimation, Transformer Network, Multi-scale Fusion, Structural Re-parameterization, Residual Likelihood Estimation
PDF Full Text Request
Related items