Font Size: a A A

The Research Of Hand Segmentation And Pose Estimation Based On Depth Image

Posted on:2021-11-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z Z XuFull Text:PDF
GTID:1488306722958159Subject:Digital media creative project
Abstract/Summary:PDF Full Text Request
As the emerging touchless human-computer interaction technology,hand gesture recognition based on depth image has a high value of practical application in game & entertainment,VR/AR,intelligent vehicle,smart home,holographic projection and et cetera.It remained a challenging task for the development of an efficient,reliable and functional gesture recognition and HCI system.An enormous amount of research effort of gesture recognition is mainly focused on hand segmentation and poses estimation.The chief purpose of our research is to build natural HCI technical solution with high efficiency for its less limitation in variation in skin color and illumination condition.It achieves accurate hand segmentation and robust hand gesture recognition by processing single depth image and training deep neural networks.The main works and contributions of this thesis are summarized as follows.1)Since hand segmentation results for pixels near the boundary are ambiguous and independent pixels inside hand region are not reliable,it is difficult to precisely and consistently separate hands from the cluttered backgrounds.To address this issue,a new approach of candidate regions generation is proposed based on histogram threshold selection and tracing the exterior boundaries of objects by depth and flat space information from a single frame,and then obtaining hand proposal by the evaluation of a shallow convolutional neural network(CNN)for binary classification.This is the first work to utilize histogram-based thresholding algorithm for separation of objects between foreground and background in depth images,which is previously used in the gray image binarization.It is discovered that valley between two maxima similar to MINIMUM algorithm as threshold is an effective way in segmentation after smooth operation.The experimental results demonstrate that our approach acquires highly accurate hand area in real time under the scenario of single hand.Moreover,disturbance by bounding box in the following hand gesture recognition and “boundary pixel” issue by ambiguous or less-reliable boundary pixels,are avoided by the proposed method with pure hand proposal segmented from depth image.2)Instance segmentation for a pair of hands requires classification between left and right hand on pixels inside hand region.A combination method of fully convolutional networks(FCN)and histogram-based thresholds selection algorithm is proposed.After generation of candidate proposals by depth and flat space information and identification of hand region by binary classification,class weight balancing,hybrid dilated convolution(HDC)and skip-connection of concatenate operation are introduced into an improved framework of FCN in order to achieve higher accuracy by pixel-wise classification between left and right hand.Fine-tuning the segmentation instance of left and right hand is applied at the last stage.Experimental results prove that our model attains better performance by the above improvements on FCN.Furthermore,the combination method of improved FCN and histogram-based thresholds selection algorithm overcomes the weakness of semantic segmentation caused by ambiguous or less-reliable boundary pixels especial under the disturbance of other objects.It is focused on the minute results within the range of hand region on pixel-level,after removing background and other objects.Even hands with self-occlusions can be consistently segmented by the approach.3)Since articulated hand is the most flexible part of human body,it is a challenging work to estimate its pose and recognize its gesture.For the sake of simulations such as moving,clicking and zooming in human-computer interaction,precisely detection and identification of fingertips is required.The improvement strategy is described in the following two aspects.In the first place,confining the action of hand fingers within a limited range by in-plane image rotation of depth map is able to decrease the degree of freedom(Do F)of hand gesture significantly.A method to calculate the optimum angle of rotation is applied in order to fully represent the canonical pose of hand than alternative approaches.Moreover,angle mapping technique is originally introduced to avoid discontinuous behavior during the training.In the second place,detection and identification of fingertips based on encoder-decoder architecture enhance the generalization ability of the model by Gaussian heat map as middle representative.At the same time,the quality loss during the up-scaling operation of heat map is also avoided.Compare to existing methods,experimental results show that our approach optimized by above two strategies achieves higher performance in the location of fingertips within real time.4)Since it is essential to display the manifold of hand appearance and simulate hand pose by hand skeleton model,3D spatial structure constructed by depth information is supposed to be fully utilized in the gesture recognition algorithm to locate the 21 hand joints precisely in 3D space.However,previous studies are mainly focused on the works that depth image is treated as flat image,and then depth data tends to be mapped as gray values during the convolution processing and features extraction.To address this issue,an approach of 3D CNN hand pose estimation with end-to-end hierarchical model and physical constraints is proposed.After reconstruction of 3D space structure of hand from depth image,3D inputs are converted into dense voxel grid for data reduction.At last,gesture recognition or pose estimation is applied by 3D CNN.For replacement of previous multi-stage cascaded method,hierarchical approach is proposed to train the network in an end-to-end model by fusing global branch and local branch into the3 DCNN networks.Moreover,compare to the existing methods with directly rectification of unrealistic pose during the validation phase,unrealistic hand pose is avoided and processing duration is shortened by the integration of explicit modeling of physical constraints and spatial relation into the estimation model,such as collinear or coplanar from key joints on each finger.According to the experimental results,mean accuracy for the location of all 21 joints is greatly improved as well as real-time performance at the inference time,which consistently outperforms several previous hand pose estimation methods.
Keywords/Search Tags:HCI, Depth image, Hand semantic segmentation, Hand instance segmentation, Hand pose estimation, Deep Learning
PDF Full Text Request
Related items