Font Size: a A A

3D Gesture Modeling,recognition And Learning For Social Robot

Posted on:2020-04-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y Q KuangFull Text:PDF
GTID:1368330596975726Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Social robots may reshape the whole world in the future,and the natural human robot interaction is one of the core technologies.As a natural human-computer interaction method,gesture interaction is widely used in social robot interaction systems.However,the existing gesture recognition algorithms require a large amount of high-quality training data,and the learning process is complicated and poorly scalable,which is difficult to apply on the real application scenarios of the social robot.Focusing on gesture interaction technology for social robots,this thesis studies how to utilize a small amount of labeled data,even a single sample data for training efficient gesture recognition algorithms.In addition,this thesis exploits the advantages of multi-modality fusion to improve the performance of the gesture recognition algorithms.The main contributions of this thesis can be summarized as follow:Since the hand is a high-DOF articulated structure,the accurate labeled data are not always accessible,which leads to the existing supervised learning based algorithms are extremely costly.To address this issue,this thesis develops a semi-supervised approach based on multi-view projection.Firstly,the unlabeled depth image is projected onto three orthogonal planes.Secondly,the auto encoder model is applied to learn a kind of bottleneck feature by using one projection to predict another,and the bottleneck feature can be treated as the implicit pose representation in the latent space.Finally,the labeled samples are used to learn the mapping from the latent representation to the 3D joint position.Experimental results show that the proposed method improve performance from a previous best of 19.60 mm to 17.04 mm,while reducing the dependence on labeled data.Traditional one-shot gesture recognition approaches often have the following limitations: 1)the frequently-used motion based features only focus on the motion part while losing the information in the holding phase,thus making the representation discontinuous;2)feature extraction has not focused on the effective areas of hands,thereby leading noise from background;3)the classification algorithms ignore the spatiotemporal position information.In order to solve these problems,this paper proposes a simple yet effective saliency feature extraction method by exploiting the context information.The proposed approach can completely preserve both the dynamic and the static information of the gestures,thus richer and more robust features can be extracted.Furthermore,an improved dynamic time warping algorithm based on feature matching is proposed.In particular,the similarity between two frames is measured by the density and accuracy of the feature matching,after which the dynamic programming technique is applied to search for the optimal matching of two gesture sequences.Not only the proposed algorithm guarantee the continuity and accuracy of the gesture representation,but also it takes full advantage of the spatiotemporal position information of the features.Experimental results prove that the proposed algorithm achieves comparable performance with the state-of-the-art approaches without the need of designing complicated feature descriptors.Recent gesture recognition algorithms using deep learning framework often require designing the network cautiously,and the training process is tedious.Besides,these models need be retrained with the addition of new data.Taking these problems into consideration,this paper proposes a unified multi-modal fusion framework,namely,VDTW(Voting based Dynamic Time Warping).Firstly,the 3D implicit shape model(3D-ISM)is applied to characterize the space-time structure of the local features extracted from different modalities.Then,all votes from the local features are incorporated into a common probability space,which is later used for building the distance matrix.Meanwhile,an cheap upper-bounding method is proposed to speed up DTW.All of above make the VDTW suitable for large-scale multi-modal gesture classification tasks.Finally,the experiments on Chalearn IsoGD multi-modal gesture dataset demonstrate the proposed algorithm achieves a comparable performance with the algorithms based on deep learning methods.Based on the previous research works,a social robot system named JIAJIA is built to validate the gesture interaction systems in household scenario.Multiple volunteers participate in the test,and the feedback is satisfactory.Meanwhile,the quantitative results also demonstrate the practicability of the system.
Keywords/Search Tags:gesture recognition, dynamic time warping, semi-supervised learning, one-shot learning, multi-modality
PDF Full Text Request
Related items