| The goal of human pose estimation is to predict the location of human keypoints in an input image.Currently,most algorithms are based on heatmap representation,where the true coordinates of each keypoint are represented as a heatmap for the model to learn from.However,existing methods based on heatmap representation use the heatmap mean square error as the loss function.This loss function pays too much attention to the activation values of each pixel in the heatmap,which deviates from the requirement of the final accuracy on the maximum value coordinates of the heatmap.To address this problem,we introduce an explicit constraint on the gradient of the heatmap in the loss function,which guides the model to learn to predict smoother heatmap representations.On the other hand,when the input sample is occluded or the human body pose is different,the key point localization difficulty of the sample will also change accordingly.However,existing methods do not take this issue into account when constructing target heatmaps.Therefore,how to construct target heatmaps with adaptive difficulty for different input samples is a challenging problem worth studying.For this problem,we utilize a self-distillation training strategy to gradually improve the manually constructed target heatmaps by leveraging the network’s previous knowledge during training.In summary,this dissertation proposes a human pose estimation algorithm based on heatmap gradient constraint and self-distillation strategy,with the specific contents as follows:·This dissertation proposes a human pose estimation method based on heatmap gradient constraint.The heatmap mean squared error(MSE)used to measure the quality of the predicted heatmap is overly focused on the activation values of each pixel,which deviates from the requirement of the final accuracy for the maximum coordinate of the heatmap.Therefore,we propose to introduce an explicit constraint on the gradient of the heatmap,specifically by calculating the MSE on the gradient map of the heatmap.The proposed heatmap gradient constraint can guide the model to predict heatmap representations that are closer to Gaussian distribution shapes,thereby achieving higher accuracy in predicting the keypoint coordinates.Our proposed method is independent of the model structure,so it can benefit any heatmap-based human pose estimation model.To study its effectiveness,we evaluate it on models with three different structures and two widely used benchmark databases MPII and COCO.The results show that for these two databases,the method proposed in this dissertation achieved stable performance improvement on the three models,which proves the superiority of the proposed method.·This dissertation proposes a human pose estimation method based on selfdistillation strategy.Existing methods do not take into account the adaptive uncertainties assigned to input images of different difficulties when constructing target heatmaps for samples.Therefore,we propose to use the self-distillation strategy to improve the manually constructed heatmap labels.Specifically,the model uses its own predicted heatmaps as soft targets in the early stage of training.Predicted heatmaps exhibit adaptive prediction uncertainties for different input samples,providing supervision information on uncertainty.Furthermore,we propose a stepping strategy that increases the distillation distance as the learning rate decays,ensuring the difference between the teacher and the student at low learning rate conditions.Since the self-distillation strategy is independent of the model structure,the proposed method is evaluated on three models with different structures.Experimental results on two widely used benchmark databases MPII and COCO illustrate the superiority of the proposed method.In summary,regarding the research topic of human pose estimation based on heatmap representation,we introduced an explicit constraint on the heatmap gradients in the loss function to guide the model to learn smoother predicted heatmaps.Additionally,we improved the handcrafted target heatmaps by utilizing the self-distillation strategy of distilling soft targets from the early training stage’s predicted heatmaps on the training labels.This further improved the performance of the human pose estimation algorithm based on heatmap representation. |