Human pose estimation is an important research task in computer vision.The goal is to locate and identify the key points of the human body in the image,and connect the key points into the corresponding human pose according to the relationship between the human parts in order.It is the basis of action recognition,human tracking and human-computer interaction and other tasks,and is widely used in many practical scenes with the rapid development of artificial intelligence technology.However,in terms of practical application,there are still many problems and challenges in human pose estimation,such as low estimation accuracy,large amount of network parameters,high computational complexity,large errors in occlusion and difficult key points detection,etc.In order to solve the above problems,this paper studies human pose estimation based on convolutional neural network.The main research results are as follows:(1)A human pose estimation method based on improved feature fusion and attention is proposed.Based on the high-resolution network,the bottleneck block was redesigned through the global context module and depth-wise separable convolution to enhance the context modeling ability of the model.The basic module was designed by combining spatial and channel self-attention,which effectively reduced the information loss in the feature extraction process.The network feature fusion method was optimized by combining multi-resolution features to extract finer feature information.Experimental results show that the improved model can effectively improve the prediction accuracy of the original network,and the average prediction accuracy on the COCO validation set is improved by 3.2%.(2)A lightweight network incorporating dense connections is proposed.The bottleneck block and the basic module are redesigned by Ghost convolution,and the dense connection mode and dense unit are innovated in the basic module.On this basis,the strength of feature extraction during network feature fusion is further enhanced,and finally the detection accuracy of the network is useful and sufficient and the complexity of the network is greatly reduced.Experimental results on the COCO validation set show that compared with the high-resolution network,the improved network has the number of parameters reduced by 71.5%,the computational complexity reduced by 35.2%,and the AP increased by 0.6%.(3)Aiming at the problem that it is difficult to detect occlusion,a lightweight densely cascaded pyramid network is proposed based on the above research.An efficient human detection algorithm is used before the original network input to accurately identify the occluded human body to reduce interference.Secondly,the attention feature fusion is used to improve the residual structure of the basic module to enhance the ability of the model to extract occlusion features.After the network output,Global Net and Refine Net are used for secondary fusion and inference of feature information to enhance the ability to detect occluded key points.After experimental tests,the average accuracy on the MPII dataset is 91.2%,and the detection accuracy of different occlusion ratios on the 3DOH50 K dataset is better than the mainstream methods. |