Font Size: a A A

Research On Vision Based Multimodal Dynamic Gesture Recognition Algorithm

Posted on:2019-08-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y X LuoFull Text:PDF
GTID:2428330566486093Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
As the key technology for gesture interaction,gesture recognition algorithm has a wide range of applications,such as deaf and mute assistance,drone control,virtual reality,augmented reality,and so on.Dynamic gestures have more freedom in operation and a rich variety of movement because of the changes in both the temporal dimension and the spatial dimension,and so they are widely used in reality.At present,vision-based dynamic gesture recognition mainly faces the following difficulties such as interference of complex background,fusion of multi-modal data,and diversity of gesture movement.Based on the above application background and difficulties,this thesis does the following research work on visual dynamic gesture recognition based on RGB-D images:Consider fusion of multimodal data and recognition of visual dynamic gesture,this paper proposes a recognition method based on a three-stream convolutional neural network(3SCNN).The inputs of 3SCNN are multimodal data,including two type of Pix Change Probability Map(PCPM)generated by RGB-D image sequences and the Enhanced Static Pose Map(eSPM)genenrated by depth image sequences.The PCPM could describe the inter-frame motion information,and the eSPM could represent the spatial pose of the dynamic gesture.The network could extract rich spatiotemporal features of dynamic gesture videos by combining 3D convolutional and 3D pooling operation,which replaces the traditional complex process of manual designing features.In addition,3SCNN integrates three different stream of sub-network by concatenating features and fusing the classification probability of two 3D-CNN streams,so that the multimodal data could be effectively used.In the paper,the method achieves 52.09% accuracy on the validation subset and 58.22% accuracy on testing set in IsoGD dataset(249 classes),which also achieves 96.88% recognition rate on SKIG dataset(10 classes).So the above results could demonstrate the effectiveness of the proposed method.In this paper,we also propose a dynamic gesture recognition framework based on long short-term memory and multi-scale convolutional neural network.The inputs of network are also the two types of PCPM generated by RGB-D image sequences.The network is mainly composed of multi-scale convolutional structures and long short-term memory units.The multi-scale convolutional structure could learn the multi-level appearance features of dynamic gestures,while the convolution kernel decomposition and 1?1 convolution in the structure could reduce the amout of network parameters.The long short-term memory unit in the network mainly learns the long-term high-level spatiotemporal characteristics of dynamic gestures from the apparence features extracted in multi-scale convolutional structures.Finally,the network achieves 56.21% accuracy on the validation subset and 60.58% accuracy on testing set in IsoGD dataset,which also achieves 98.02% recognition rate on SKIG dataset.More importantly,the proposed network not only achieves better recognition result,but also uses fewer network parameters,which improves the application prospect of the method.
Keywords/Search Tags:dynamic gesture recognition, 3D convolutional neural network, long short-term memory, multi-scale convolutional neural network
PDF Full Text Request
Related items