Font Size: a A A

Research Of Gesture Recognition Based On Deep Learning And Sparse Representation

Posted on:2015-09-13Degree:MasterType:Thesis
Country:ChinaCandidate:H T HongFull Text:PDF
GTID:2308330464467952Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
With the development of computer science, human-computer interaction technology research has become an important part of research in the field of computer technology. The gesture recognition as part of the human language understanding, has a very important role. On the one hand, it is the primary means of virtual reality human-computer interaction; on the other hand it is also the main tools which can be used for the deaf persons to communicate with the normal people through computers. Therefore gesture recognition as an important means of human-computer interaction, has got more and more attention and research by researchers. In recent years, a growing number of researchers employ the computer vision methods to do the gesture recognition and achieved certain results. Computer vision-based gesture recognition method gets the gesture directly through cameras, the impact of such an approach is not constrained by equipment. But so far t here are still many difficulties in many aspects of gesture recognition. For the static gesture recognition, affected by the background environment, how to segment the exact contours of the hand-shaped feature is a huge challenge. In the dynamic gesture re cognition, how to track gesture reasonably and segment the gestures is a huge challenge too. We conduct a deep research on computer vision based gesture recognition through the knowledge fusion of image processing, computer vision, machine learning and so on and impose a gesture recognition method combining supervised learning and unsupervised learning and a dynamic gesture recognition method based on sparse coding. The main contributions of this thesis can be summarized as follows:1. For static gesture recognition, through the combination of supervised learning and unsupervised learning, first we get the initial weights through a sparse auto-encoder trained by RGB patches randomly sampled from original images, then take these weights as convolution kernels to convolve the original RGB images to obtain local features of the images, and then a pooling operation is employed to extract the global features of the images, meanwhile, make the recognition result more quickly and accurate. This approach leverages the powerful self-learning function of deep learning to avoid a gesture image segmentation process and thus reduce effect of the background environment impacted on the recognition. What’s more, this method can reduce the dimension of the features, which improves the recognition speed andrate.2. Due to the increasing number of equipment which can get depth information, such as Kinect,and compared with the color image, depth image can character the three-dimensional features of the surface of subjects, and is not affected by color, shading and light and other factors.3. For dynamic gesture recognition, a novel dynamic gesture recognition model based on sparse representation is proposed in this paper. Firstly, the spatial-temporal interest points are detected through 3D corners detection method from the video sequences; secondly, a cuboid is founded around each spatial-temporal interest point and the 3D spatial-temporal descriptor features are extracted based on the cuboids and regard this feature as the features of the video at this location. Thirdly, we encode local 3D spatial-temporal descriptor features within the sparse coding by a dictionary previously trained by all the 3D histogram of orientation features. In so doing, each local 3D spatial-temporal descriptor is transformed to a linear combination of a few atoms in a pre-trained dictionary. Finally, we employ a max pooling strategy to get the final representation of a video and we use multi-class linear SVM to accomplish the classification task. This met hod does not require segmentation of gestures and complex mathematical modeling process, which greatly reduces the complexity of the recognition of dynamic gestures.The results above are gesture recognition researches from the perspective of computer vision, which are forward looking and full of challenges. This thesis has some breakthrough in theory and some innovation in technology. This work opens up a new way for gesture recognition, which has extremely important theoretical significance and application value.
Keywords/Search Tags:Sparse auto-encoder, Convolution, Pooling, Spatial-temporal Interest Points, 3D Spatial-temporal Descriptor, Sparse Coding
PDF Full Text Request
Related items