Font Size: a A A

Research On Dynamic Gesture Recognition Based On Deep Learing

Posted on:2021-05-02Degree:MasterType:Thesis
Country:ChinaCandidate:H F TaoFull Text:PDF
GTID:2518306560453564Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Gestures as a natural form of human communication have great significance in human-computer interaction.With the rapid development of artificial intelligence,dynamic gestures that are compatible with people’s living habits are gradually prominent in human-computer interaction.Dynamic gestures allow people to interact in a more natural and direct way in daily life,and provide people with a fast and convenient lifestyle.Therefore,it has become an important subject in the field of human-computer interaction research.In dynamic gesture recognition methods based on deep learning,the extraction of gesture features is the key.However,because dynamic gestures include both spatial and temporal features,most models cannot effectively extract spatiotemporal features at the same time,and cannot be connected in data with long time intervals.The data information is relatively advanced.Besides,the model often uses conventional convolution,which leads to a single feature extraction method and insufficient feature information.Most of the networks for dynamic gesture recognition are deep networks,and the resulting feature loss and gradient disappearance also occur.Seriously affected the quality of the model.Aiming at the above problems,this paper proposes a new gesture recognition architecture combining feature fusion network and variant ConvLSTM.main tasks as follows:(1)Aiming at the problem that the spatio-temporal feature information cannot be effectively extracted at the same time,this paper uses 3D convolution to simultaneously extract the features of the spatio-temporal feature information,and combines the Resnet structure to alleviate the feature loss caused by the deepening of the network.In the Resnet network,in order to shorten the connection between the input layer and the output layer,a channel merge operation is introduced,and a parallel fusion local feature extraction module is constructed to improve the performance of local spatiotemporal feature extraction.(2)Aiming at the problem of long-term dependence of video data,this paper proposes an improved VConvLSTM module based on the timing modeling capabilities and spatial feature description capabilities of the ConvLSTM structure.Reducing the convolution operation of three gates will reduce the calculation amount and improve the performance of global spatiotemporal feature extraction.(3)Aiming at the lack of depth information of dynamic gesture features,a depth separable structure different from traditional convolutions was improved,and a depth feature extraction module was constructed to extract gesture depth features,which solved the problem that most models only used traditional convolutions to extract features.The traditional convolution feature extraction method is simple and the feature information is insufficient.In this paper,the channel merge operation is added to the deep feature extraction module to alleviate the problem of loss of feature information.Because this paper introduces the idea of Resnet and channel merging in both the local space-time extraction module and the deep feature extraction module,the two networks are collectively referred to as a fusion network.Experiments on the SKIG and Jester datasets using the deep learning architecture proposed in this paper have achieved experimental accuracy of 99.70% and 95.68%,respectively.Experimental data shows that the proposed architecture can effectively improve the accuracy of dynamic gesture recognition.Comparing the proposed model with other advanced models,the proposed model still has certain advantages.
Keywords/Search Tags:dynamic gesture recognition, feature extraction, 3D convolutional network, ConvLSTM, depthwise separable convolution
PDF Full Text Request
Related items