Font Size: a A A

Multimodal Dynamic Gesture Recognition Based On Spatiotemporal Model

Posted on:2020-11-13Degree:MasterType:Thesis
Country:ChinaCandidate:X D QiFull Text:PDF
GTID:2428330602452131Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of intelligent electronic devices,the way of human-computer interaction is changing with each passing day.Gesture is a common way for people to express themselves.Its recognition technology has important significance in the field of human-computer interaction.Gesture recognition is expected to open a new chapter in human-computer interaction,bringing people a more convenient and more humane way of interaction.This thesis studies the related algorithms of dynamic gesture recognition,and proposes a new algorithm to improve the efficiency and accuracy of gesture recognition to promote the practical application of gesture recognition.This thesis first introduces the research significance and current research status of gesture recognition,and summarizes the relevant theories of deep learning.Then it analyzes the two mainstream gesture recognition algorithms of dual-flow method and three-dimensional convolutional neural network(3DCNNs).Aiming at the problem of low efficiency and accuracy of existing algorithms,combined with the advantages of dual-flow method and 3DCNNs,a multi-modal dynamic gesture recognition algorithm based on spatiotemporal model is proposed.The thesis does the following work:(1)In view of the low efficiency of dynamic gesture recognition,this thesis introduces deep separable convolution into 3DCNNs,and designs a new spatiotemporal model network in combination with Bi Conv LSTM and Shuffle Net.The network is not only very suitable for space-time data,but also has high efficiency.Then,for the problem that the number of frames during gesture video acquisition is uncertain,a key frame extraction method based on interframe difference is proposed and the number of video frames is unified.Experiments have shown that the network can improve the efficiency of gesture recognition while improving the accuracy.(2)The improvement of the accuracy of the gesture recognition by the optical flow is huge,but the use of the original video frame to extract the optical flow frame will bring a huge amount of calculation and storage.In order to solve this problem,this thesis introduces a convolutional neural network(CNNs)optical flow method TVNet,designed an end-to-end optical flow generation method.The optical flow frame generated by this method is not only more suitable for gesture recognition than the traditional method,but also does not bring additional storage consumption.(3)In order to further improve the accuracy of gesture recognition,this thesis proposes a multi-modal dynamic gesture recognition algorithm based on space-time model based on RGB,depth and optical flow modes.Firstly,the phenomenon of partial darkness of the gestures in the RGB video is subjected to histogram equalization for image enhancement,and the noise existing in the depth video is clustered for denoising.Finally,the data of RGB,depth and optical flow are input into the network to extract features,and feature fusion is used as the basis for classification.In this thesis,the proposed algorithm is tested on the gesture recognition public dataset Iso GD.Firstly,the proposed new spatiotemporal model network is compared with other excellent algorithms.The results show that the space-time model proposed in this thesis can improve the recognition efficiency without reducing the accuracy.Then,the other improved parts are compared and analyzed.Finally,the multi-modal algorithm based on space-time model proposed in this thesis is compared with other algorithms on the Iso GD dataset.
Keywords/Search Tags:Dynamic gesture recognition, spatiotemporal model, depth separable convolution, multimodal, feature fusion
PDF Full Text Request
Related items