Font Size: a A A

Pose Spatial-temporal Feature Extraction And Matching Based On Graph Neural Network

Posted on:2021-04-03Degree:MasterType:Thesis
Country:ChinaCandidate:S J TanFull Text:PDF
GTID:2428330611998038Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of techniques,video and photo data are increasing rapidly.Analysis of the features in these pictures and videos helps us understand human behavior and has important theoretical and practical significance,so we need to analyze the features of the pose and pose sequence.In this thesis,we studied how to use graph neural networks to extract features from2 D human pose to match corresponding targets.Previous works have explored how to use graph convolution to estimate the corresponding 3D pose from the 2D pose and recognize the corresponding action,etc.,but they assume the natural topology of the human skeleton in the adjacent matrix,which limits the receptive field of the graph convolution.In addition,these methods only use the coordinate information of 2D poses,which cannot overcome the problem of depth ambiguity caused by insufficient information.At the same time,3D in-the-wild human pose annotating is very difficult,which greatly limits the generalization of the existing in-the-wild pose extraction and matching models.To solve these problems,we mainly propose two improvements.First,we propose an adaptive semantic graph convolution operator,which learns the strength of the natural connection of the human skeleton while learning the connections between directly connected joints.Second,we exploit ordinal depth information,i.e.,the information about whether the joint point is closer to the camera relative to its parent node to construct the pose graph.On the one hand,it helps reduce the depth ambiguity inherent in 2D poses.On the other hand,it helps overcome the difficulty of obtaining 3D joint point labels in the wild environment.We conducted experiments on three different pose spatio-temporal feature extraction and matching tasks: Regress the corresponding 3D pose from a single image,regress the corresponding 3D pose from an image sequence and match skeleton sequences with their corresponding semantic actions.The experimental results show that adaptive semantic graph convolution can better extract the spatial-temporal features of the pose,and the ordinal depth information can effectively solve the inherent depth ambiguity of 2D pose.
Keywords/Search Tags:Graph Neural Network, Pose Spatial-Temporal Feature Extraction, 3D Human Pose Estimation, Action Recognition
PDF Full Text Request
Related items