Font Size: a A A

Research On Hand Pose Estimation Method Based On Multimodality

Posted on:2022-11-22Degree:MasterType:Thesis
Country:ChinaCandidate:X J SunFull Text:PDF
GTID:2518306746468704Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Hand pose estimation is a very important field in computer vision.This technology can obtain accurate bone key-points of hand from input images,which is widely used in security,medical treatment,3D film production and other fields.Traditional estimation method using the depth map or RGB image as input,although based on the depth map or RGB image of hand pose estimation methods have obtained certain achievement in certain scenarios but the inherent challenges remain:the depth of the sensor is largely dependent on the distance,can only be used in indoor,so the method based on depth map exist many limitations in practical application;RGB based methods have serious self-occlusion,and RGB images are sensitive to light,so they have high requirements on the scene in practical application.The difficulty of labeling 3D hand poses reflects the lack of large and real-world datasets in the field.With the popularization and promotion of consumer depth cameras,RGB-D images are more and more widely used,and the fusion of depth information and color information improves the accuracy of computer vision tasks.As one of the products of depth camera,IR image shows good performance under poor light.These challenges and the advantages of depth cameras inspire me to combine multiple input modes to enhance the advantages and compensate for the disadvantages.In this paper,a systematic method for hand pose estimation is carried out based on multimodal input.The main research work includes the following aspects:(1)RGB-D images have shown excellent results in visual fields such as scene segmentation and object detection.Due to the high price of early depth cameras,RGB-D images were rarely used in early research methods,and most researchers chose to use RGB image or depth map as input.Aiming at the problem of inaccurate3 D pose estimation in the current methods based on a single depth map or a single RGB image,this paper proposes a multi-mode cross fusion method combining depth information with RGB color information and designs a multi-mode cross fusion network model based on RGB-D.In this model,a pair of RGB image and depth map are used as inputs.After the initial feature extraction module,RGB mode and depth mode are fused in the cross fusion module.Finally,3D hand pose with higher accuracy is output by the regression module.The cross fusion module is the enlightening contribution,which aims to achieve better fusion of the two modes.(2)The method based on RGB-D can effectively improve the accuracy of the model,but it is limited in practical application because RGB images are sensitive to light.Aiming at the unsatisfactory effect of depth camera when the light is not enough in practical application,a multi-mode hand pose estimation method based on IR and Depth is proposed by using IR image as one of the inputs for the first time.In this method,RGB image with high requirements on illumination conditions are abandoned.Instead,IR image and corresponding depth map,which are the products of depth camera under near-infrared light source,are selected as input.In the aspect of network model design,attention mechanism is introduced to strengthen the learning of important features of the network,and cascade method is used to fuse the two modes layer by layer.The regression module adopts the common regression method based on heatmap,and the final output of the model is three-dimensional hand key-points.This method can perform well under dark or too strong light conditions.(3)Aiming at the problem of few IR datasets in real scenes in the field of hand pose estimation,this paper builds the first multimodal dataset with 3D annotation information,the Deptrum Hand dataset.The dataset is shot in real scenes and contains3 D joint point annotation data,which fills the gap of the IR image hand pose estimation dataset.The dataset contains pairs of IR images and depth images,which can be used for training and validation of related hand pose estimation methods.The data set contains a variety of hand movements,and can also be used as a hand movement recognition dataset.This paper trains the multimodal hand pose estimation model based on IR and Depth on the Deptrum Hand dataset and verifies the effectiveness of this method.The method has reached the leading effect in the field on multiple public datasets and self-built datasets,and the evaluation of the model also shows that the model has good performance and great development space in practical application.
Keywords/Search Tags:hand pose estimation, convolutional neural network, RGBD-fusion, multimodal fusion, attentional mechanism
PDF Full Text Request
Related items