Font Size: a A A

Research On Monocular Human Pose Estimation Based On Deep Learning

Posted on:2023-04-02Degree:MasterType:Thesis
Country:ChinaCandidate:Q Y ZhangFull Text:PDF
GTID:2568306818497034Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Human pose estimation is one of the important research directions in the field of computer vision.The goal is to locate the positions of keypoints in the image and group them into corresponding human poses.It is the basis and premise of advanced visual tasks such as action recognition,Person re-identification and pedestrian detection.In recent years,the rapid development of deep learning and convolutional neural networks has promoted the research of human pose estimation,making it widely used in intelligent monitoring,human-computer interaction,motion analysis and other fields.However,there are still some problems and challenges in human pose estimation in practical applications.On the one hand,problems such as occlusion and pose distortion caused by human interaction in complex scenes affect the performance of the model.How to extract effective spatial information and semantic information is the key to improve the performance of keypoint localization.On the other hand,existing pose estimation methods focus on model accuracy and ignore the balance between speed and accuracy,and cannot be applied in resource-constrained situations.In order to solve the above two problems,this paper conducts in-depth research on human pose estimation based on the deep learning framework,from three aspects: designing high-precision network structure,lightweight model and efficient network architecture,to deal with the research difficulties and challenges in human pose estimation.The main research results are as follows:(1)Research on how to design a high-precision network structure to extract more discriminating and expressive features under the framework of deep learning,thereby improving network accuracy.To solve the problems of localization error and inference error in interference scenes such as pose diversity,a spatial and contextual aware network based on multi-resolution for human pose estimation is proposed.First,the detailed enhancement module is used to fuse the shallow features into the output high-resolution features through residual skip connection to enhance the detail information,and then the spatial self-attention module is used to calculate the correlation between feature positions to obtain the global dependencies of local features.Finally,an information supplementation module is designed for low-resolution deep features,and the deep features are supplemented with spatial information semantic information by skip connection and parallel dilated convolution branches to enrich the feature expression ability.The accuracy of the model is 77.0% on the COCO validation dataset and 91.0% on the MPII validation dataset.(2)Research on how to design a lightweight network to improve model efficiency under the framework of deep learning.Aiming at the problem that existing human pose estimation algorithms attach importance to model accuracy and ignore model efficiency,a lightweight human pose estimation model based on feature pyramid structure is proposed.The model uses Efficient Net V2 as the backbone network,and designs a semantic embedding fusion module and a pose refine module to build a lightweight network structure.First,sub-pixel convolution is used to replace all nearest neighbor upsampling operations to restore resolution in the model,and channel information is used to supplement spatial information and reduce the loss of feature information.According to the semantic difference between shallow features and deep features,a semantic embedding fusion module is designed to perform cross-scale feature fusion.Finally,a pose refine module is designed for the output features,and the spatial attention mechanism is used to adaptively supplement important information for the output features,thereby optimizing keypoint location.The lightweight model without pre-training can reduce the parameter amount and calculation amount of the high-precision network by two-thirds,and achieve the accuracy rate of 75.1% on the COCO validation dataset and89.8% accuracy on the MPII validation dataset.(3)Research on how to design an efficient network architecture,that is,a network structure with high accuracy and light weight under the framework of deep learning.Aiming at the problem that the lightweight model maintains model efficiency but loses certain model accuracy,a human pose estimation network based on collaborative learning and feature constraint is proposed.First,the preprocessed data is simultaneously input into the same two lightweight networks to extract features,and then the output features are aggregated to generate high-quality soft targets,which together with ground truth labels supervise each subnet,improve the network prediction performance,and use the feature constraint to control the keypoint positioning range to further improve the positioning accuracy.The model without additional performance overhead achieves the accuracy of 75.5% on the COCO validation dataset and 90.3% on the MPII validation dataset.
Keywords/Search Tags:human pose estimation, high-precision network, lightweight model, collaborative learning
PDF Full Text Request
Related items