With the rapid development of information technology,human pose estimation,as an important branch of computer vision,plays a supporting role in practical applications.It has been widely used in real scenes such as security monitoring,assisted driving,and virtual reality.However,due to the complex factors such as occlusion of multiple people,clothing,large scale variation,and high degree of pose folding in human pose estimation tasks,current algorithms generally face problems of low accuracy or poor real-time performance due to excessive pursuit of high accuracy.In this thesis,based on the actual application of industry,aiming at the problems existing in the top-down and bottom-up algorithms in practical scenes,the advantages of both are integrated,and a target detection-based single-stage human pose estimation network YOLite Pose is proposed.The main contributions of this thesis are threefold:Firstly,A single-stage human pose estimation network based on target detection is proposed.This thesis improves the YOLO-Pose human pose estimation network and designs a network structure optimized by the C2 f module,bidirectional feature pyramid network,and Smooth L1 loss function to balance keypoints detection accuracy and network test speed.The C2 f module splits the feature channels and adds the residual information of each layer to obtain rich gradient flow information,reducing the convolution computation compared to the C3 module.At the same time,based on the improvement of the feature fusion network using Bi FPN,the redundant calculation nodes in the PAN network are deleted,the long-distance information channel is added to improve the feature expression ability,and the SE attention is used to improve the feature expression ability and discrimination,thus reducing the calculation of redundant features.Compared with the PAN network,Bi FPN is more lightweight and has better performance.Finally,by improving the MSE loss function and using the Smooth L1 loss function to consider the sensitivity of human scale and density changes,the network can more reasonably evaluate human poses in a more stable way.Through experiments,the three improvements have improved the accuracy and real-time performance of pose estimation,and this algorithm has outstanding performance in balancing accuracy and inference speed compared to existing algorithms.Secondly,A lightweight human pose estimation network based on the CBAM attention mechanism is proposed on the basis of improving the network in the previous section.The network uses channel attention and spatial attention to enhance feature information extraction at the input of the last layer of feature extraction to optimize feature extraction weights,enhance the network’s attention to different features,and improve pose estimation accuracy.In addition,deep separable convolution is used to replace conventional convolution in the backbone network to reduce calculation and keep well real-time performance.Finally,the effectiveness of this chapter’s algorithm is verified by experimental comparisons on public datasets and visualization results on a self-made Kunqu Opera dataset.In further,Based on the actual application scenario of vehicle inspection,this thesis designs and implements a workstation intelligent monitoring system that applies the above algorithm.By analyzing the system requirements,the overall framework of the system is designed,and the implementation process of each subsystem of the system is described separately.Finally,the interfaces and functions of the system are displayed to verify the effectiveness and practicality of this thesis’ s algorithm.In summary,this thesis proposes a target detection-based single-stage human pose estimation network YOLite Pose from the perspective of industrial practical applications,aiming at the problems of low accuracy or poor real-time performance in human pose estimation tasks,and further optimizes the network of YOLite Pose-v2 by combining the convolutional block attention mechanism and deep separable convolution. |