3D hand pose estimation is one of the important research topics in the field of computer vision,and its research results are widely used in human-computer interaction,virtual reality,rehabilitation and medical care,robot imitation learning and other fields.Since human hand has complex structure,high degree of freedom,variable pose and highly similar parts,3D hand pose estimation based on computer vision is still a challenging task.With the rapid development of convolutional neural network technology,as well as the continuous establishment of large hand datasets,the hand pose estimation technology based on computer vision can be further developed,making it possible to complete accurate and robust hand pose estimation using convolutional neural network technology.In view of the current problems,this thesis focuses on the research of methods for 3D hand pose estimation based on convolutional neural network.The details of the research are as follows.(1)A procedure for generating synthetic datasets of the global hand pose was established.To address the problem that most of the existing hand pose datasets are labeled with 3D positions of hand joints and lack of global hand pose labels,and the hand gestures in the dataset do not completely cover the hand action space,a procedure for generating synthetic datasets of the human hand was established using the Open Scene Graph 3D rendering engine and a 3D human hand model.This procedure could generate depth images and global pose labels of human hands under different gestures and provide a large amount of high-quality training data for global hand pose estimation.(2)A global hand pose estimation method based on pixel voting was proposed.To address the problem of large error in global hand pose estimation,a convolutional neural network with an encoder-decoder structure was established to generate feature maps of semantic and pose information;hand pixel positions and pixel-by-pixel pose voting were obtained from the feature maps using semantic segmentation and pose estimation branches,respectively.Finally,the pose voting of hand pixels was aggregated to obtain the voting result.Experimental results show that the global hand pose estimation method based on pixel voting can robustly and accurately estimate global hand poses from depth images.(3)A hand localization and image 3D cropping method based on YOLO-V3 was established.To address the problem of locating hand from the original depth image and cropping the hand image,a hand 2D center estimation method was established based on the YOLO-V3 target detection algorithm to obtain the hand detection bounding box and2 D center.Then the hand depth center estimation method was established to estimate the depth center of the hand from the depth image corresponding to part of the hand bounding box,and combined the hand 2D center and depth center to obtain the hand 3D center,and finally used the 3D bounding box cropping method to obtain the hand image.The experimental results show that the hand localization and image 3D cropping method based on YOLO-V3 can locate the hand center from the depth image and can effectively crop the hand image with the background removed.(4)A hand-object pose estimation method based on anchor regression was proposed.To address the problem of hand-object pose estimation in the hand-object interactions state,a hand-object center localization and image 3D cropping method was established to obtain the hand-object 3D center,and crop the hand-object image as the input of the subsequent method.Then,a hand-object pose estimation based on anchor regression method was built to lay anchor points evenly on the image,predict the position deviation and weight of anchors relative to the key point of hand and object by convolutional neural network,and then process all the anchor outputs by weighted summation to obtain the key point positions.The experimental results show that the hand-object pose estimation method based on anchor regression can accurately estimate the hand and manipulated object pose from the depth images. |