Font Size: a A A

Research On Deep Network Architecture For Human Pose Estimation

Posted on:2021-01-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:K SunFull Text:PDF
GTID:1368330602994246Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Human pose estimation is one of challenging tasks in computer vision.It aims to localize human keypoints(e.g.head,shoulder,hip)and then group them as a human part(e.g.arm and leg)or human pose.Human pose,as a human body description,is widely used in many vision fields,such as pedestrian detection,person Re-ID,action recognition and prediction and human-computer interaction.With the development of deep learning and deep neural network,human pose estimation benefits from the evol-vation of architecture and the performance is greatly improved.The human pose es-timation framework mainly includes:the network for modeling human configuration and keypoint detection network.In this thesis,we firstly analyze the characteristic of human pose estimation and then redesign the network architecture for modeling human configuration and keypoint detection.Moreover,we extend the proposed keypoint de-tection network to other related computer vision tasks,such as image classification and semantic segmentation.Global and local pose normalization for modeling human configuration.Due to the high degrees of freedom of human poses,modeling the complicated relations among keypoints is still a challenging problem.We propose the global and local pose nor-malization modules to constrain the degrees of freedom of human poses and then the normalized relations of keypoints can be modeled by a simple and lightweight network.High-resolution keypoint detection network.Human pose estimation is sensitive to the spatial resolution loss.Keypoint detection network needs to generate high-resolution representations and then get accurate localization.The previous works recover the high-resolution representations from the low-resolution representations,which results in spatial resolution loss.In this thesis,we propose a new network,named High-Resolution Network(HRNet),that maintaining the high-resolution rep-resentations instead of recovering the spatial resolution.With the high-resolution rep-resentations,the network also learns the low-resolution representations for capturing the context information and then utilize the multi-resolution fusion to enrich high-and low-resolution representations each other.We empirically demonstrate that our proposed keypoint detection network achieves the superior performance on many benchmarks.High-resolution network extensions.In this thesis,we explore how to extend HR-Net designed for human pose estimation to other relative and similar vision tasks.We design a simple classification head for HRNet and apply it to ImageNet classification.The pre-trained model on ImageNet is helpful for the training in other vision tasks.Se-mantic segmentation is also reviewed as a pixel-wise labeling problem,which benefits from the elimination of spatial resolution loss.We also design a segmentation head for HRNet to deal with the diversity of object scales.Comparing to the mainstream segmentation networks,HRNet has the superior performance and lower complexity.In summary,we focus on studying the architecture of deep neural network for human pose estimation and propose a new network for modeling human configuration and a new keypoints detection network.We evaluate our proposed networks on multiple human pose estimation benchmarks and achieve the better performance.
Keywords/Search Tags:Human Pose Estimation, Global Pose Normalization, Local Pose Normalization, High-Resolution Network, Image Classification, Semantic Segmentation
PDF Full Text Request
Related items