Font Size: a A A

Research On Dynamic Hand Action Recognition Based On Deep Neural Networks

Posted on:2020-10-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q Y WangFull Text:PDF
GTID:1368330572980582Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Hand guesture is one of the most important ways of communication in human daily life that contains rich semantic information.Hand gestures are human-active languages that widely used in human-computer interaction,virtual reality and sign language of aphasia.Hand gestures are inherently friendly and intuitive.Hand gesture recgnition using machine vision is a valuable task.The research in this paper provides new research ideas and related methods for dynamic hand gesture recognition.Also,it lays a technical foundation for the future research in the field of human-computer interaction and sign language recognition.Hand,as the most flexible part of human body,makes its detection,tracking,classification and identify tasks very challenging.Machine vision-based algorithms often face the following problems:1)Low resolution;2)The background environment of hands is disordered;3)the interaction of hand-hand or hand-objects;4)Hand being blocked;5)Different gestures share similar appearance;6)Multi-degree of freedom characteristics of hands;7)Multiple view ambiguity;8)Multiple shapes and scales;9)Training and fine-tuning parameters of the detection and recognition neural networks.This thesis comes from self-selected project.The research is based on machine vision technology for human-computer interaction and sign language dynamic hand gesture recognition.The research topic is divided into the following four aspects:1)Through the pixel-level skin area detection model(MFS),the skin mask will be extracted from the image.Then the feature indexed dictionary learning algorithm is used to refine the boundary contour of the skin mask so as to preserve the skin area information as much as possible;2)We propose a cascaded feature aggreegation detection convolutional neural network-CCNN.Based on the skin mask generated from skin detection model to supervise CCNN to detect human hand location;3)We extract the skeleton information of hands by convolutional pose machine as the basic detector.Then we train a strong detector using both multi-view and monocular methods;4)We present the hand skeleton sequence by spatial-temporal graph structure.Then Hand Action GCN(HA-GCN)framework is proposed to recognize several dynamic gestures of human-computer interaction and sign language.The innovations and research of this paper are as follows:(1)This paper proposes a multi-feature skin mask detection method.Because of the serious background interference on hand detection,it is necessary to remove them.Therefore,this paper studied on the features used in traditional pixel-level skin detection methods and select the features that have the highest contribution to skin area detection.Instead of traditional single-pixel detection method,we uses superpixels(pixel clusters)to aggregate local appearance information.Combining the first perspective wearable device to capture a scene containing a human hand to perform global clustering to perform mask extraction of the skin region pixels.After extracting the skin area mask,in order to improve the boundary to retain more comprehensive information,we propose HOG feature index dictionary learning algorithm to further accurately segment the skin area and accomplish fine skin mask.This work laid a good foundation for the further step of hand detection.(2)In this paper,an object detection method(CCNN)based on cascaded feature aggregation convolutional neural network is proposed.Based on the skin mask,hand proposal is made to find out the bouding boxes that may contain hand objects in the skin area.According to the geometrical characteristics of hands and k-means line methods,a positioning frame that may contain a hand is proposed from the skin to supervise the receptive field of the hand detection networks.In order to improve the robustness of the human hand detector,this paper focuses on the inadequacies of the real-time target detection framework SSD.Then cyclically aggregates the neighbour-layer features for training and detection.The aggregated features preserve context information and solve the problem of each layer being individually detecting hand targets of the corresponding scale.(3)Based on the precise hand detection,we use estimation framework convolution attitude detector(CPM)as the base detector.Then we enhance it with multi-view and monocular methods for more robust keypoint extraction.Due to the interactivity of hands,multi-view ambiguity,occlusion and multi-degree of freedom,the multi-view methods can be used to detect the skeleton position from a different angle.The N-best(closest to the true annotation)detection result to other perspective skeletons.The point is reprojected,then retrain the base detector,thereby obtaining a strong detector.The monocular detector trains a network based on the above view information to learn the triangulation of the two-dimensional keypoints to the three-dimensional keypoint map in different views to infer the three-dimensional skeleton map.(4)A dynamic skeleton gesture recognition method(HA-GCN)is proposed.According to the pose estimation framework,hand skeleton information are extracted from videos.The wrist keypoints of each frame set as the root node.Then a total of 11 keypoints are selected to represent the motion.According to the key points of each frame and their connections,the temporal-spatial graph structure is constructed to represent the relationship of hand motion.Then the GCN is improved so that it can perform convolution operations on the temporal-spatial domain graph to achieve the dynamic hand gesture recognition function based on skeleton information,namely the dynamic gesture GCN recognition framework(HA-GCN).Through the study of dynamic hand gesture recognition,this paper provides innovative ideas and algorithms for this topic.However,there are still some shortcomings and needs further study:1)The artificial feature extraction and deep learning framework that comprehensively considers and combines the feature extraction and deep learning object detection.However,environmental information(such as the end of the upper limb in the posture estimation,that is,the approximate position of the wrist extents hands)may further improve the robustness of hand detection.2)The accuracy of the pose estimation framework for hand pose estimation needs to be improved.3)The HA-GCN framework based on skeleton information relies on the accuracy of pose estimation,and can be combined with the method based on RGB video(such as the idea of human target detection in this paper)for higher precision.4)HA-GCN is oriented to single-hand targets.Future work should consider multi-handed targets and human upper body interaction scenarios.The speed of operation is also the focus of future research.
Keywords/Search Tags:Hand action recognition, machine vision, deep learning, object detection, pose estimation
PDF Full Text Request
Related items