Font Size: a A A

Research On Two-dimensional Human Pose Estimation Algorithm Based On Multi-scale Fusion

Posted on:2022-08-02Degree:MasterType:Thesis
Country:ChinaCandidate:X Y TangFull Text:PDF
GTID:2518306494473014Subject:Mathematics
Abstract/Summary:PDF Full Text Request
Human pose estimation is to predict the position of the keypoints in the digital images or videos,and then connect them into skeletons according to human structure.Through these skeletons,human behavior can be understood and analyzed,laying a foundation for the next application.It is one of the hotspots in the field of computer vision research.It also has a wide application in video surveillance,search,human-computer interaction,and smart elderly care.This paper mainly studies the two-dimensional multi-person pose estimation based on deep learning in static images.Due to the high flexibility of joints,the posture of the human body in real life scenes is often more complicated.Coupled with the influence of factors such as natural environments and the human wearing in the scene,the local details of the human body joints are easily blurred or obscured by other objects and human body parts,which causes some keypoints are difficult to detect or accurately locate in the two-dimensional human pose estimation task.Adopting a top-down two-dimensional multi-person pose estimation method,this article introduce pyramid convolutions and attention modules,and it also constructs a two-dimensional multi-person pose estimation model based on multi-scale fusion.The model can simultaneously obtain the local joint information and the global structure information of the human body in the image,so that it can more accurately locate some keypoints that are difficult to detect.We use the MS COCO human body keypoint data set to train and test our model.Experiments show that our model has a higher accuracy.The main work of this paper is as follows:(1)The size of the human body displayed in the image is not the same,this is because the global information in the image is difficult to obtain and the distance between different human bodies and the camera in the same scene is inconsistent.This paper introduces the pyramid convolution and attention module to construct a local and global composite connection structure.It uses different size pyramid convolution kernels to simultaneously extract the features of different size targets in the image and combines the attention module to model the remote dependence in the image.This enables the model to comprehensively use the local and global information of the image to make reasonable inferences on keypoints that are difficult to detect,such as partial hiding,overlap,and blur.So it can more comprehensively detect the keypoints of the human body and correctly understand the types of these keypoints.(2)According to the problem that part of the spatial information is easy to lose when the serial deep learning network model is down-sampled,a nested skip connection structure is constructed.This structure can connect the feature map of the shallow layer with high resolution and the feature map of the corresponding scale after deep upsampling,which can partially make up for the missing spatial information in the model sampling,and make the keypoints location more accurate.
Keywords/Search Tags:pose estimation, deep learning, attention module, skip connection
PDF Full Text Request
Related items