Font Size: a A A

Research On Dynamic Gesture Recognition Algorithm Based On S3D+BiConvLSTM+MobileNet

Posted on:2021-05-11Degree:MasterType:Thesis
Country:ChinaCandidate:G S FanFull Text:PDF
GTID:2518306569997929Subject:IC Engineering
Abstract/Summary:PDF Full Text Request
As an intuitive and natural interaction method,gestures have always attracted the attention of researchers in the field of human-computer interaction.However,dynamic gesture recognition is still a challenging research topic due to the variable environmental conditions,inconsistent behavior between different performers,and differences in time distribution.With the rapid development of deep learning,many convolutional neural networks are used in the field of dynamic gesture recognition.However,the current algorithm models are usually complex,have high model parameters,and require high computer computing capabilities,which are not conducive to apply.In response to this problem,we proposes a new S3D+Bi Conv LSTM+Mobile Net cascaded network structure suitable for dynamic gesture recognition,which achieves higher recognition accuracy with a small amount of model parameters.Our model first uses the S3 D network to extract short-term spatio-temporal features,uses different size convolution kernels to perform 3D convolution operations,expands the width of the wide network,and uses deep separable convolution operations in the module to replace standard 3D convolution operations,reducing convolutional model parameters.Then we use the Bi Conv LSTM variants to extract long-term spatio-temporal features based on the front and back timing information,perform pooling and fully-connected operations in the input gate,output gate,and forget gate to replace the convolution operation,reducing the space dimension.Finally,the Mobile Net lightweight network is used to extract higher-level spatio-temporal features.The video stream data is encoded into2 D feature maps after S3 D and Bi Conv LSTM network structures.The converted feature maps are fed into Mobile Net for downsampling,which can greatly reduce the convolutional calculation time and memory resources.After the network design is completed,the proposed model is optimized here,considering a variety of different network optimizers,combined with anti-overfitting strategies such as regularization and Dropout,so that the network can have better performance.At the same time,the channel attention mechanism is introduced into the input gate of Bi Conv LSTM,and the channel maximum element is used as the score evaluation to filter the input features,which effectively improves the accuracy of network recognition.In this dissertation,the above scheme is tested and evaluated on large gesture datasets,and compared with the state-of-art gesture recognition methods.The experiments show that the network model parameters proposed obviously perform better than the previous methods,achieving a recognition accuracy of 94.9% on the Jester dataset and 45.88% on the Iso GD-RGB dataset.Effectively recognize the dynamic gesture information in the video,instead of state-of-art methods,it decomposes 70% of the model parameters,which greatly reduces the amount of model parameters.At the same time,a real-time gesture recognition system is incorporated here,which can recognize local video gestures and real-time video gesture recognition functions.It takes 96 ms to recognize mutual dynamic gestures,which realizes the application value of this model.
Keywords/Search Tags:dynamic gesture recognition, network structure, feature extraction
PDF Full Text Request
Related items