Font Size: a A A

Research On Human Pose Estimation Methods Based On Convolutional Neural Networks

Posted on:2021-01-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:F ZhangFull Text:PDF
GTID:1368330611955007Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Human pose estimation is a hot research field in computer vision.Earlier methods rely on hand-crafted representations and shallow recognition models learned independently,hence often yielding sub-optimal performance.With the development of deep learning,convolutional neural network(CNN)based methods dominate the recent progress by jointly learning more discriminative features and inference models in an end-to-end fashion.Although CNN-based methods have become the mainstream of human pose estimation and made great progress,there still exists some practical problems.Firstly,existing works mainly focus on improving the accuracy of the human pose estimation methods but ignoring the trade-off between efficiency and accuracy,which is the most critical issue for human pose estimation methods.Secondly,researchers do not recognize the significance of quantization error and optimization discrepancy for the performance of human pose estimation,which is a key issue for high-precision human pose estimation.In order to solve the above two problems,this dissertation carries out researches from the three aspects(efficient network architecture,light-weight model training strategy and high-precision localization)respectively.The main work and contributions are summarized as follows:(1)Existing methods do not consider the model efficiency in designing human pose estimation models.In the third chapter,we propose an efficient pose estimation architecture called Hierarchical Context Network(HCN).Firstly,we studied the pre-attentive processing in the human visual understanding.Then,we integrate the pre-attentive mechanism into the design of network architecture to create a multi-stage HCN.The network architecture consists of the stage shared low-level feature extraction module,the context incorporation module,the intermediate context learning module,etc.The whole network has several context stages and one prediction stage.All stages share the same structure with a low-level feature module,a context incorporation module,a backbone module and an intermediate context learning module.The features from the stage shared low-level feature extraction module in the current stage and the context information from the preceding stage are taken into the context incorporation module to induce a context enriched representation.The sub-network in each stage is supervised by the intermediate context learning module.In the HCN,the low-resolution sub-networks can find the coarse locations of body parts efficiently and the high-resolution sub-networks can localize the body parts precisely.The multi-granularity design in the HCN architecture can help reduce the computational cost while preserving the performance.Finally,the effectiveness of the proposed modules and the multi-granularity design in the HCN architecture are evaluated on two human pose benchmarks.(2)There is a lack of generic model reduction method in the human pose estimation filed.Even if we obtain a light-weight human pose estimation model,we will still have no way to preserve or boost the performance of the light-weight model.In the fourth chapter,we propose a fast human pose estimation method based on pose distillation to solve the above problem.Firstly,we investigate the redundancy of the human pose estimation model by analysing a set of stacked hourglass network variants and propose a model-agnostic method to reduce the model parameters and the computational cost.Secondly,the light-weight human pose estimation model having fewer parameters and less computational cost always has the problem of performance degradation.In order to boost the performance of the light-weight human pose estimation model,we introduce the idea of knowledge distillation into the research field of human pose estimation and design a pose estimation specific distillation method called pose distillation to transfer the dark knowledge from the heavy teacher model to the light-weight student model.To shed light on what knowledge the teacher model transfer and find the reason why the novel pose distillation method helps improve the model generalization performance,we provide the visualization analysis.In the analysis,a number of possible reasons are given to clarify the pose knowledge in the distillation process.Finally,comprehensive experiments are conducted to evaluate the effectiveness of the proposed fast pose distillation method.(3)In the fifth chapter,we propose a distribution-aware coordinate representation of key point method to alleviate quantization error in the coordinate representation of human pose estimation.Firstly,We locate the defects of coordinate representation in human pose estimation and figure out the reason why there is quantization error in coordinate representation.Secondly,we propose generating accurate heat map distributions for unbiased model training in the coordinate encoding stage and deriving a distribution-aware heat map decoding method in the coordinate decoding stage designed to comprehensively account for the distribution information of heat map activation via second-order Taylor-expansion based distribution approximation.Finally,extensive experiments are conducted to verify the effectiveness of the proposed method from two perspectives(coordinate encoding and decoding).The significant performance advantages on different backbones show the generality of our method.(4)In order to moderate the optimization discrepancy between the heat map representation and the coordinate regression in the integral pose regression method and improve the model accuracy,we propose an accurate location adaptive integral pose regression method in the sixth chapter.Firstly,we analyze the optimization discrepancy existing in the integral pose regression method under the assumption that the network can predict a perfect heat map.Specifically,we perform consistency assessment by comparing the output of the integral pose regression method taking the perfect heat map as the model input with the coordinate encoded in the perfect heat map to reveals the existence of optimization discrepancy.Secondly,we propose a Location Adaptive Softmax(LAS)model to solve the discrepancy between the heat map representation and coordinate regression.The LAS model is obtained by parameterizing the Softmax function and can modulate the distribution of the heat map and the probability map while converting the heat map into the probability map.Thirdly,we propose a decoupling training strategy compatible with LAS model which allows to integrate our LAS model with existing off-the-shelf heat map based methods without re-training.Finally,we conduct extensive experiments and analysis to validate the effectiveness of the LAS model.The experimental results show that the proposed method can solve the optimization discrepancy problem very well.
Keywords/Search Tags:human pose estimation, keypoint localization, keypoint detection, heatmap regression, coordinate regression
PDF Full Text Request
Related items