Font Size: a A A

Human Pose Estimation For Resource-Limited Scenes

Posted on:2022-07-29Degree:MasterType:Thesis
Country:ChinaCandidate:Z ZhangFull Text:PDF
GTID:2518306725981529Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Human pose estimation,also known as keypoint detection,is a basic research in the field of computer vision.It is also a key technology for a series of derivative applica-tions.In recent years,due to the development of deep learning and the improvement of basic computing power,the research on human pose estimation has made comprehen-sive progress at home and abroad.However,many methods ignore the cost of model deployment while pursuing better performance.In some specific application scenarios,the limitation of computing resources and storage resources will put forward more strict requirements for the speed and accuracy of the human pose estimation model.Based on the development of human pose estimation and the more stringent constraints in resource-limited scenarios,this paper carries out the research following two ideas.The one is to further improve the prediction accuracy of the model without increasing the network parameters significantly? Another is to reduce the complexity of the network greatly without obvious attenuation of the prediction accuracy.Specifically,the main contents of this paper are as follows:1.In view of the ubiquitous phenomenon in the research of human pose estimation where the belief maps predicted always have low-resolution,this paper proposes a model-agnostic belief map enhancement method named as Enhance Net.The En-hance Net,based on a simple and lightweight design,can be widely used in most popular methods.The Enhance Net enhances the belief maps through certain pro-cessing,so as to correct some prediction errors existing in the basic model and then improves the prediction accuracy.The experiment results which carried out on MPII and COCO datasets show that the improvement of the Enhance Net is univer-sal and significant.In addition,when using DLCM and HRNet as the basic model,the Enhance Net can build a new SOTA score on the two datasets,respectively.En-hance Net with its lightweight and efficient characteristics,combined with the basic model in resource-limited scenes,can improve the generalization performance of the original model,and will not introduce too much computational burden.2.This paper explores the lightweight solution of human pose estimation.Firstly,based on the concepts of depthwise convolution and global context attention mecha-nism,this paper redesigns a model-agnostic lightweight module,which can directly replace the basic component commonly used in the pose estimation network,so as to achieve the purpose of model compression.And then a new lightweight pose estimation network which named as LPN is proposed.LPN is the most lightweight and efficient among existing methods.In order to overcome the difficulty of the lightweight network training procedure,this paper proposes an unconventional it-erative training strategy,which not only makes up for the lack of a pre-trained model but also can change the learning rate periodically to give full play to the potential of LPN.Besides,this paper optimizes and improves the usual practice in the human pose estimation framework,proposing a new general post-processing method to ob-tain more accurate prediction results.LPN has fewer parameters and FLOPs than other methods,but it can also achieve considerable prediction accuracy.The most important thing is that the LPN performs very well in the actual inference speed,which is more suitable for resource-limited scenes.3.This paper also develops a multi-person real-time human pose estimation system based on the lightweight pose network.The system captures real-time video through webcam,uses YOLO algorithm to detect the bounding box of human,uses the LPN-Tensor RT engine to estimate the huamn pose,draws human skeleton according to the recognized pose,and then feeds back to the original video in real time.Com-pared with the Open Pose open-source project,the system can achieve higher FPS and the GPU memory consumption is only 379 MB,less than one tenth of Open Pose,which is more suitable for resource-limited scenes.
Keywords/Search Tags:Deep Learning, Convolutional Neural Network, Human Pose Estimation, Belief Map Enhance, Lightweight Model
PDF Full Text Request
Related items