Font Size: a A A

Action Recognition And Hand Pose Estimation In Images And Depth Maps

Posted on:2020-01-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:R LiFull Text:PDF
GTID:1368330572482082Subject:Mechanical design and theory
Abstract/Summary:PDF Full Text Request
Making computers understand human activities and behaviors is the prerequisite of human-computer interaction and human-robot cooperation.The fast development of machine learning theories makes vision-based human activity analysis become more and more mature.The state of the art of some fields has been able to satisfy practical demands,e.g.fingerprint recognition and face recognition.There still exist some fields that are not mature enough,e.g.action recognition and hand pose estimation.As two important branches of human activity analysis,action recognition and hand pose estimation provide theoretical solutions for the applications with regard to human-computer interaction and human-robot cooperation.The dissertation studies action recognition and hand pose estimation in images and depth maps:Two still action recognition methods are proposed,which achieve spatial modeling by hierarchical image representation.For the first method,the SIFT is used as a local descriptor,Fisher vectors are applied to encode the SIFT,and a spatial pyramid is taken as the strategy of the hierarchical representation.For the second method,eight recent pretrained deep networks are used to extrate features,and a division with overlappedregions serves as the strategy of the hierarchical representation.An offline action recognition method is derived from supervised time-series segmentation.The method is built on the framework of structured time-series,treats human skeletons as a point in multidimensional space,and uses the DTW(Dynamic Time Warping)to solve the problem of the action velocity variation.Each training sequence serves as an atom in a dictionary for collaborative representations by the Ridge.Action classification is achieved by the reconstruction errors of the collaborative representations.In view of the fact that the the reconstruction errors continuously measure the similarity between a test sequence and training sequences in the l2-norm sense,a supervised time-series segmentation algorithm is further proposed.The proposed algorithm not only can be applied to offline action recognition,but also motion sequence segmentation and general sequence segmentation.Two online action recognition methods are proposed,which combine depth maps and 3D skeletal sequences.The first method uses pairwise relative joint positions in the 3D skeletal sequences to describe human poses and the LOP(Local Occupancy Pattern)that stems from the depth maps to characterize an object shape.For each action,the K-SVD is utilized to learn a class-specific dictionary from training sequences.The dictionaries can be treated as a compact representation of the redundant training sequences.Frame-wise action recognition is achieved by regularized linear regression.The second method uses the DMM(Depth Motion Map)to describe actions.To extend the traditional DMM to online action recognition,an offline random segmentation algorithm and an online sequential segmentation algorithm are proposed for gernerating sub-sequences needed by the DMM.To enhance the discriminative power of the DMM for the static actions and the actions whose differences only lie in the temporal order of human poses,the 3D skeletal position and the skeletal velocity are introduced to the feature vectors,which serve as the complementary descriptors.A hand pose estimation method based on deep residual networks is proposed.To higihtlight the improvement effect of a residual module,an ordinary deep network is first desighed,and the influence of the batch normalization on the ordinary deep network is analysed.Based on the ordinary deep network,the residual module is then introduced.The resulting deep residual network is further optimized from the aspects of the network width and network height.The effect of a bottleneck layer is also studied.A method for evaluating the dynamic tracking capability of depth cameras is proposed.Traditional measurement methods focus on the static precision of depth cameras,but the main concern of action recognition and hand pose estimation is the dynamic tracking capability.To explore whether depth cameras have been a hardware bottleneck that restricts the development of action recognition and hand pose estimation,the tracking accuracy influenced by the object position relative to a depth camera,motion velocity and motion direction is systematically studied with the aid of a numerical control linear motion guide.The Kinect v2 and the Intel RealSense SR300 are taken as two examples of depth cameras.Evaluation experiments on the benchmark datasets demonstrate that almost all the proposed action recognition and hand pose estimation methods can be comparable with state-of-the-art methods,and some even break the existing best recoreds.The dynamic tracking measurement experiments demonstrate that without considering hand detection,depth camreas will soon be a nonnegligible factor that restricts the progress of hand action recognition and hand pose estimation.To adapt to the emerging methods,it is necessasy to publish new bench:mark datasets with more accurate annotations using higher-precision depth cameras.
Keywords/Search Tags:Action Recognition, Motion Segmentation, Hand Pose Estimation, Depth Camera, Human-Computer Interaction, Human-Robot Cooperation
PDF Full Text Request
Related items