3D Gesture Modeling,recognition And Learning For Social Robot

Posted on:2020-04-19

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Y Q Kuang

Full Text:PDF

GTID:1368330596975726

Subject:Control Science and Engineering

Abstract/Summary:

PDF Full Text Request

Social robots may reshape the whole world in the future,and the natural human robot interaction is one of the core technologies.As a natural human-computer interaction method,gesture interaction is widely used in social robot interaction systems.However,the existing gesture recognition algorithms require a large amount of high-quality training data,and the learning process is complicated and poorly scalable,which is difficult to apply on the real application scenarios of the social robot.Focusing on gesture interaction technology for social robots,this thesis studies how to utilize a small amount of labeled data,even a single sample data for training efficient gesture recognition algorithms.In addition,this thesis exploits the advantages of multi-modality fusion to improve the performance of the gesture recognition algorithms.The main contributions of this thesis can be summarized as follow:Since the hand is a high-DOF articulated structure,the accurate labeled data are not always accessible,which leads to the existing supervised learning based algorithms are extremely costly.To address this issue,this thesis develops a semi-supervised approach based on multi-view projection.Firstly,the unlabeled depth image is projected onto three orthogonal planes.Secondly,the auto encoder model is applied to learn a kind of bottleneck feature by using one projection to predict another,and the bottleneck feature can be treated as the implicit pose representation in the latent space.Finally,the labeled samples are used to learn the mapping from the latent representation to the 3D joint position.Experimental results show that the proposed method improve performance from a previous best of 19.60 mm to 17.04 mm,while reducing the dependence on labeled data.Traditional one-shot gesture recognition approaches often have the following limitations: 1)the frequently-used motion based features only focus on the motion part while losing the information in the holding phase,thus making the representation discontinuous;2)feature extraction has not focused on the effective areas of hands,thereby leading noise from background;3)the classification algorithms ignore the spatiotemporal position information.In order to solve these problems,this paper proposes a simple yet effective saliency feature extraction method by exploiting the context information.The proposed approach can completely preserve both the dynamic and the static information of the gestures,thus richer and more robust features can be extracted.Furthermore,an improved dynamic time warping algorithm based on feature matching is proposed.In particular,the similarity between two frames is measured by the density and accuracy of the feature matching,after which the dynamic programming technique is applied to search for the optimal matching of two gesture sequences.Not only the proposed algorithm guarantee the continuity and accuracy of the gesture representation,but also it takes full advantage of the spatiotemporal position information of the features.Experimental results prove that the proposed algorithm achieves comparable performance with the state-of-the-art approaches without the need of designing complicated feature descriptors.Recent gesture recognition algorithms using deep learning framework often require designing the network cautiously,and the training process is tedious.Besides,these models need be retrained with the addition of new data.Taking these problems into consideration,this paper proposes a unified multi-modal fusion framework,namely,VDTW(Voting based Dynamic Time Warping).Firstly,the 3D implicit shape model(3D-ISM)is applied to characterize the space-time structure of the local features extracted from different modalities.Then,all votes from the local features are incorporated into a common probability space,which is later used for building the distance matrix.Meanwhile,an cheap upper-bounding method is proposed to speed up DTW.All of above make the VDTW suitable for large-scale multi-modal gesture classification tasks.Finally,the experiments on Chalearn IsoGD multi-modal gesture dataset demonstrate the proposed algorithm achieves a comparable performance with the algorithms based on deep learning methods.Based on the previous research works,a social robot system named JIAJIA is built to validate the gesture interaction systems in household scenario.Multiple volunteers participate in the test,and the feedback is satisfactory.Meanwhile,the quantitative results also demonstrate the practicability of the system.

Keywords/Search Tags:

gesture recognition, dynamic time warping, semi-supervised learning, one-shot learning, multi-modality

PDF Full Text Request

Related items

1	Research On Semi-supervised Classification Algorithm Based On Integrated Neural Network
2	Semi-supervised Generalized Zero-shot Learning Based On Modal Fusion
3	Research On Video Annotation With Machine Learning Techniques
4	Research On The Semi-supervised Few-shot Classification Based On Siamese Network And GMM
5	Based On Gesture Recognition By Used Mobile Phone Universal Control
6	Research On Semi-supervised Few-shot Learning Method Based On Ensemble Learning Strategy
7	Gesture Recognition Based On Deep Learning And Its Application In Virtual Experiment
8	Research On The Application Of Geometric Information In The Semi-supervised Learning
9	Research On Radar Emitter Signal Recognition Method Based On Improved Cooperative Semi-supervised Learning
10	Research On Few-shot Learning And Model Light-weighting In Image Recognition