Font Size: a A A

The Research On Attention Algorithm For Human Pose Estimation

Posted on:2023-05-26Degree:MasterType:Thesis
Country:ChinaCandidate:L PengFull Text:PDF
GTID:2558306911457334Subject:Engineering
Abstract/Summary:PDF Full Text Request
Human pose estimation is one of basic research direction in computer vision field,where the target is that detect all human keypoints locations on an image or a video frame.These keypoints are usually composed with the joints of the human body,but for the needs of specific tasks,they may also include some artificially defined non-joints.Recent years,due to the development and application of VR,AR and the human-computer interaction,and the demand on enhancing monitoring and security,etc.the research of human pose estimation is greatly promoted,and human pose estimation has become one of the popular fields of computer vision.The human pose estimation methods can be divided according to some of their attributes.If it is divided according to the coordinate dimension of the keypoints to be predicted,it can be divided into 3D human pose estimation and 2D human pose estimation,where the former can be seen as the prediction of 3D spacial location,and the latter is the prediction of 2D location;if according to the number of persons on an image,it can be divided into multi-person pose estimation and single-person pose estimation,where the former is more challenging than later,so that the most research about human pose estimation are also related to former,recently.However,no matter what method of pose estimation,there will be several challenges,which arise from occlusion,view angle change,clothing or texture perturbance,illumination change,and background changing perturbance.Aiming at the background change,the thesis proposes a method using positional attention mechanism,which makes the network can focus on the person,reduce the capture of background information,and effectively improve the capability of the network to resist background change.In addition,aiming at the problem that most traditional methods are difficult to achieve a balance between performance and inference speed,the thesis proposes a method using channel sifted network.This method can reduce the number of parameters through a lightweight backbone network and maintain high performance.The main work of this thesis is as follows:(1)Aiming at the problem that the performance of existing human pose estimation methods is easily disturbed by the background,a 2D pose estimation method using positional attention module is proposed.The network constructed by the method can be divided into three parts:First,the backbone constructed by ResNet are concatenated with deconvolutional group to be a global branch,which extracts the information of keypoints positions on image and improve resolution of the feature maps,respectively.Then,an attention branch which contain a positional attention module fuse the feature maps extracted by different depth of the network and as its input,which obtain effective enough attention feature maps.Finally,the feature map containing position information extracted by global branches is fused with attention feature map,and a convolution layer is used to reduce the dimension of the fused feature map in channel direction to generate keypoints heatmaps.Experiments show that the method can solve well the perturbance problem caused by background change,and significantly improve the performance of keypoint detection.The network achieved that AP scores of 71.2 and 70.7 were obtained on the validation set and test set of COCO,respectively.(2)Aiming at the problem that the existing pose estimation methods are difficult to achieve a balance between performance and inference speed,a lightweight 2D pose estimation method using channel sifted network is proposed.In the network constructed by the method,a lightweight ResNet is used as backbone,and this lightweight ResNet is composed of redesigned lightweight residual modules.In addition,it is considered that the reduction of parameter may weaken the ability of network learning.A channel attentional branch is proposed and used,and it can effectively reduce the negative impact of redundant channels in feature map on the prediction of keypoint heatmap.Experiments show that the network has enough less parameters and fast inference speed compared with most existing networks,and still has considerable performance.The network achieved that AP scores of 69.6 and 69.2 were obtained on the validation set and test set of COCO,respectively.
Keywords/Search Tags:Deep Learning, Human Pose Estimation, Attention Mechanism, Lightweight Network, Computer Vision
PDF Full Text Request
Related items