Font Size: a A A

Research On Key Technologies Of Vision-based Hand Gesture Interaction

Posted on:2022-02-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:L P FangFull Text:PDF
GTID:1488306569958699Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of computer vision technology and intelligent terminal equipment,vision-based hand gesture interaction has become a widely used mode of humancomputer interaction because of its natural,friendly,and convenient advantages.Vision-based hand gesture interaction involves three key technologies: vision-based hand gesture recognition,hand pose estimation,and hand gesture authentication.These three tasks are closely related to each other and have long been hot research topics in academia and industry.Vision-based hand gesture recognition,hand pose estimation,and hand gesture authentication involve the processing of hand gesture images or videos,making them face difficulties such as low image resolution,diversity of perspective,similarity and occlusion between fingers,motion blur caused by the rapid movement of the hand,etc,because of the small hand area on images or videos and high flexibility of hand joints.Those difficulties are challenging for vision-based hand gesture interaction.To this end,for hand gesture recognition and hand pose estimation,this paper conducts indepth researches on designing corresponding algorithms that balance accuracy and real-time performance.For hand gesture authentication,this paper carries out extensive researches on constructing the dataset and designing the authentication algorithm,expecting to promote further development in this field.Overall,the main research achievements and contributions of this paper are four-folds,shown as follows:(1)A static hand gesture recognition method based on hand geometric features and Fisher vector is proposed.This method constructs a novel feature representation of the 2D shape of the hand to meet the requirements of recognition accuracy and efficiency.The core idea is to extract a local descriptor with rotation and scale invariability for each hand's contour point and to encode all the local descriptors using the Fisher vector encoding method,to obtain the Fisher vector of the 2D shape of the hand,and finally classify the resulting Fisher vector by an SVM classifier.Experimental results on five public datasets show that the proposed method can achieve a better trade-off between recognition accuracy and efficiency than the previous traditional methods and mainstream deep learning methods.(2)A feature covariance matrix based dynamic hand gesture recognition method is proposed.This method provides a unified framework for dynamic gesture recognition based on different input modalities,including RGB hand gesture videos,depth hand gesture videos,and3 D hand skeleton sequences.The core idea is to track interesting hand points and to extract local descriptors representing their positions,movements,etc.,and then use the feature covariance matrix to fuse all local descriptors within a dynamic hand gesture into a lowdimensional and compact feature covariance matrix descriptor,and finally classify the obtained feature covariance matrix descriptor by an SVM classifier.Experimental results on three public datasets with different modalities show that the proposed method has more advantages than previous traditional methods and mainstream deep learning methods on the trade-off between recognition accuracy and efficiency.(3)A 3D hand pose estimation method based on joint graph reasoning and pixel-to-offset prediction network is proposed.This method estimates the uvz coordinates of hand joints from hand depth images in a dense prediction manner,and uses the uvz coordinates of hand joints as the ground truth directly for end-to-end training,effectively combining the advantages of direct regression based methods and dense prediction based methods.The main novelties of this method are the proposed two core modules: a joint graph reasoning(JGR)module for enhancing the feature representation ability of each pixel and a pixel-to-offset(P2O)prediction module for predicting the offset vectors from each pixel to hand joints.Experimental results on three public datasets show that the proposed method has significant advantages in model size,estimation accuracy,and inference speed.(4)A largest-scale dynamic hand gesture dataset to date for dynamic hand gesture authentication is constructed,and an improved two-stream convolution network for dynamic hand gesture authentication is proposed.This paper constructs a large-scale dataset with up to29,160 dynamic hand gesture videos from 193 users and conducts comprehensive benchmark experiments and analyses on it to promote the development of vision-based dynamic hand gesture authentication.Besides,this paper makes an in-depth analysis of the existing twostream convolution network for dynamic hand gesture authentication and proposes a series of improvements for it,resulting in the proposed improved two-stream convolution network model,which achieves remarkable performance improvement on the self-built dataset.In summary,this paper makes an in-depth study of three key vision-based hand gesture interaction techniques: hand gesture recognition,hand pose estimation,and hand gesture authentication,and proposes a Fisher vector based static hand gesture representation,a feature covariance matrix based dynamic hand gesture representation,a joint graph reasoning and pixel-to-offset prediction network for 3D hand pose estimation,and an improved two-stream convolution network for dynamic hand gesture authentication.Experimental results on various types of hand gesture datasets prove the effectiveness of the methods proposed in this paper.
Keywords/Search Tags:Hand Gesture Interaction, Hand Gesture Recognition, Hand Pose Estimation, Hand Gesture Authentication, Image Processing, Computer Vision, Machine Learning, Deep Learninig
PDF Full Text Request
Related items