Font Size: a A A

Research On 3D Hand Pose Estimation Based On Skeleton And Deep Learning

Posted on:2021-12-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y H LiFull Text:PDF
GTID:2518306536487444Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Hand gestures are one of the important means of interaction between people and people and the world.With the development of artificial intelligence technology,hand gestures have also been applied to scenarios such as human-computer interaction,virtual reality,and augmented reality,and have become a research hotspot.With the popularity of cheap depth cameras,3D hand gesture estimation technology based on depth images has made great progress.Because hand gestures have recognition obstacles,such as large deformability and self-occlusion,there is still much room for improvement in hand gesture estimation technology.Therefore,we use the hand gesture images captured by the depth camera and uses deep learning technology to explore methods to further improve the accuracy of hand gesture estimation.We design a skeleton-based method to estimate hand gestures.This method combines depth gestures and gesture skeletons.The whole frame contains two parts.First,we use a deep generation network to model the feature space of the gesture skeleton.This article uses a classic variational autoencoder network,input the coordinate information of the skeleton,and compress the information through the encoder network to obtain the encoding.At the same time,through a decoder that is structurally mirrored with the encoder,this encoding can accurately recover the coordinate information of the skeleton.Through training on the public dataset,a reliable skeleton codec network is obtained.Then,in combination with the trained model above,we design a multi-task learning mechanism to synchronously realize the modeling of the deep gesture feature space and the feature space alignment of the deep gesture and the gesture skeleton.Because of the complex deep gestures,we design a deep coding network,extracts features from the input deep gesture images,and obtains a low-dimensional feature.In order to ensure that this low-dimensional feature is effective,it must meet two conditions.First,it can recover the original deep gesture through the network,and second,it can be mapped to its corresponding skeleton code to achieve gesture estimation.In order to ensure that these two conditions are met,two constraints are designed,namely the reconstruction constraint of depth gesture and the low-dimensional mapping constraint from depth gesture to gesture skeleton.To this end,this article correspondingly introduces a decoding network and a mapping network.Using the multi-task learning mechanism,the two networks can be trained synchronously.Our work realizes the simple mapping of deep gestures and gesture skeletons on the low-dimensional feature space,and solves the problem of estimating deep gestures.The reason why this article introduces a multi-task learning mechanism is that deep gestures and gesture skeletons are essentially different modalities of the same object.There is shared information between the two tasks of feature space modeling and feature space alignment.Using multi-task learning can effectively realize the complementarity between tasks,so that they all get better results.Finally,we evaluates the effectiveness of the proposed method in a self-comparison experiment and tests it on two public gesture datasets.Compared with existing methods,the design proposed in this article is more accurate.
Keywords/Search Tags:deep learning, hand gesture estimation, feature space modeling, multi-task learning, low-dimensional mapping
PDF Full Text Request
Related items