Font Size: a A A

Modeling Long-range Relations For Human Pose Estimation

Posted on:2020-06-13Degree:MasterType:Thesis
Country:ChinaCandidate:T QiFull Text:PDF
GTID:2428330620959996Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Human pose estimation is a challenging problem in computer vision.It requires a model to estimate the position of body parts in an image.These parts can be largely diverse,occluded and spatially related.These characteristics make pose estimation harder for traditional methods because they lack filter diversity for extracting features and ability to summarize global context.As the development of deep learning,pose estimation models have been significantly benefit from CNN frameworks,and have made promising progress in recent years.The pose estimation task often requires spatial relation modeling.The discrimination of body parts sometimes need support from other parts,side of body parts can also benefit from information of one's facing orientation or hand side,and the grouping of keypoints used in multi-person estimation depends on other keypoints.To model long-range relations as these,recently proposed models are using deeper and deeper network to capture larger receptive field.Despite that large receptive field is achieved at deep layers of these models,the shallow layers can only extract features from small area.Sometimes it is only little extra information at a far position required to determine a keypoint.If we can specifically supplement such information instead of leaving the network modeling the relation at deep layers,we can bring the detection ability ahead,and enlarge deep layers' receptive field,thus improve the network's ability in summarizing global information.In this paper we explore two approaches for enlarging receptive field,and analyze their performance.Initially,we propose a long-range relation module,which includes feature translation process with fixed offsets.We also propose a feature shifting module with learnable offsets,and a correlation attention mechanism.Our contributions are1.We propose a module with fixed-offset feature-shifting process.This module can be approximated to a kind of dilated convolution(DConv),which is called cross-channel dilated convolution(XDConv).We try to explain the reason we use XDConv and why its fewer parameters will not undermine the performance.We demonstrate by experiments the module's superior performance and the influence from dilation size.Our experiments have also shown that XDConv has equal performance to DConv.2.We further propose a feature shifting module with learnable offsets.A correlation attention module is also introduced to regulate shifted features.We show this module can be view as a convolution operation with dynamic receptive field shape.Our module shows better performance with smaller structure in experiments.We also analyze how much the module contribute to the detection of each keypoint category.
Keywords/Search Tags:Human pose estimation, deep learning, long-range relation
PDF Full Text Request
Related items