Font Size: a A A

Human Pose Estimation By Deep Learning

Posted on:2021-07-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y R BinFull Text:PDF
GTID:2518306104487094Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Human pose estimation is the task of localizing body key points from still images.It serves as a fundamental technique for numerous computer vision applications,such as action recognition,person re-identification,human computer interaction and so on.In this thesis,we research the deep learning based human pose estimation algorithms.The As body key points are inter-connected,it is desirable to model the structural relationships between body key points to further improve the localization performance.In this paper,based on original graph convolutional networks,we propose a novel model,termed Pose Graph Convolutional Network(PGCN),to exploit these important relationships for pose estimation.Specifically,our model builds a directed graph between body key points according to the natural compositional model of a human body.Each node(key point)is represented by a 3-D tensor consisting of multiple feature maps,initially generated by our backbone network,to remain accurate spatial information.Furthermore,attention mechanism is presented to focus on crucial edges(structured information)between key points.PGCN is then learned to map the graph into a set of structure-aware key point representations which encode both structure of human body and appearance information of specific key points.Additionally,we propose two modules for PGCN,i.e.,the Local PGCN(L-PGCN)module and Non-Local PGCN(NL-PGCN)module.The former utilizes spatial attention to capture the correlations between the local areas of adjacent key points to refine the location of key points.While the latter captures long-range relationships via non-local operation to associate the challenging key points.Experiments show that PGCN method improves performance of the current algorithm.The state-of-the-art methods suffer from insufficient examples of challenging cases such as symmetric appearance,heavy occlusion and nearby person.To enlarge the amounts of challenging cases,in the one hand,previous methods augment images by cropping and pasting image patches with weak semantics,which leads to unrealistic appearance and limited diversity.In the other hand,they augment image in a static way,which can not take the difference between images and the training status of the pose estimation network into consideration.In this paper,we propose a spatial transformer network based Adversarial Semantic Data Augmentation(ASDA).First,the training images are segmented by human parsing algorithm to get pure body parts which are recomposed to generate body parts with various semantic granularity.Then,a spatial transformer network is used to dynamically paste the sampled body parts to the training image,to generate challenging samples.The pose estimation network takes the generated samples as input and tries to learn from it.The spatial transformer network acts as a generator while the pose estimation network acts as a discriminator.The whole pipeline is trained in an adversarial manner.Experiments show that ASDA method improves performance of the current algorithmMost existing pose estimation approaches use a multi-stage structure,which provides the network with a mechanism for repeated inference.In this paper,we present a new Pose Mirro Distillation(PMD)model learning strategy to further boosting the performance of the multi-stage network.Specifically,The PMD first trains a mirror pose model to learns the pose structure knowledge which is implicitly included in the multi-stage output.Then the knowledge is extracted by the Mutli-Stage Heatmap Fusion module(MSHF)and transferred to ontology model so that the ontology model achieves an better performance.Experiments show that pose mirro distillation method improves the performance of the multi-stage network.
Keywords/Search Tags:Computer Vision, Human Pose Estimation, Graph Convolutional Network, Spatial Transformer Network, Deep Learning
PDF Full Text Request
Related items