Human pose estimation is one of the important research fields of computer vision.Its purpose is to restore human pose by detecting key parts or joints in a given image or video.In recent years,with the development of deep learning technology,related research results have been widely used in behavior recognition and other fields.Simple Baseline is a simplified human posture estimation model.Because it has achieved a good balance between model complexity and posture estimation effect,it has been applied well in many fields.But this model still has some problems.This paper proposes solutions to these problems:(1)A human pose estimation model based on multi-scale features is proposed.Multi resolution deconvolution module and online difficult key point mining module are added.The feature map generated at each resolution of the feature extraction module is fused with the features obtained by the heat map generation module,which can combine the local feature information extracted in the network with the global feature information.Secondly,the online difficult key mining module selects difficult key points for training based on the dynamic loss function,and can use the selected difficult key points to update the network parameters in combination with the back-propagation algorithm.This model combines the two together to solve the problem of identifying occluded key points and detecting key points with smaller scale.(2)A human posture estimation model based on aggregated residual blocks and attention mechanism is proposed.Firstly,group convolution is used to introduce cardinality to control the number of groups,reduce the amount of super parameters,and improve the timeliness of the model;Secondly,channel attention mechanism is added in the feature extraction module to help the model learn the context between image data channels,which can extract important image features more effectively.(3)A multi-stage human pose estimation model based on multi-scale features is proposed.First,a single stage module based on the Simple Baseline network is designed to reduce the problem of information loss between each stage;Secondly,a cross phase feature fusion strategy is designed to fuse the features of adjacent phases.This structure retains the flexibility of high resolution,making low resolution features and high resolution features overlap repeatedly.It can not only focus on the features of the key points themselves,but also make judgments based on the context information of the entire image,taking into account the location information and feature abstract information,which is conducive to transferring information from the early stage to the later stage. |