Font Size: a A A

Research On Two-way Fusion Gesture Recognition Algorithm Based On Attention Mechanism

Posted on:2022-06-20Degree:MasterType:Thesis
Country:ChinaCandidate:H J FangFull Text:PDF
GTID:2518306605968099Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The traditional way of human-computer interaction is in a contact way such as mouse,keyboard,touch screen,etc.With the development of science and technology,the traditional way of human-computer interaction has been unable to meet the increasing interaction needs of people.Gestures,as a common way for humans to convey information and semantics,are of great significance to the application of human-computer interaction.Therefore,the development of gesture recognition technology promotes innovation in the field of human-computer interaction,and provides a more convenient,efficient,and humanized interaction method for human production and life.By investigating the background and practical significance of the dynamic isolated gesture recognition task,the human visual attention mechanism derived from cognitive psychology and neuroscience has been studied,the related algorithms of gesture recognition have been investigated,and the attention mechanism based on three-dimensional convolutional neural network and attention has been proposed.The two-way fusion network algorithm(GLA?2WI3D).The GLA?2WI3D algorithm introduces the GLA(Global and Local Attention)structure that combines local and global attention to increase the weight of hand features,thereby improving the accuracy of gesture recognition.The main work of the thesis is as follows:(1)Aiming at the timing dependence of dynamic gesture recognition tasks,an extended three-dimensional convolutional neural network is used to extract spatial and temporal features at the same time..The network uses Inception module as the main core which a dense structure instead of the sparsely connection,and combines a three-dimensional convolution structure.The sparse and dense structure in the network uses a wider network layer structure to obtain more comprehensive gesture features,which is beneficial to improve the accuracy of dynamics gesture recognition task.(2)Aiming at the local receptive field characteristics of convolutional neural networks,the acquisition of the global receptive field depends on the multi-layer superimposed convolution structure.It is difficult for long-distance features to be correlated,and it is difficult for effective features to promote each other.In response to this phenomenon,this thesis designs dual attention modules that can obtain contextual information,including Temporal-spatial attention modules and spatiotemporal channel attention modules.The Temporal-spatial attention module enables the features to promote each other in the spatiotemporal dimension,and the spatiotemporal channel attention module enables neurons to effectively express the temporal and spatial features,increasing the proportion of effective neurons.(3)Aiming at the problem of redundant information such as background,lighting and environment in dynamic gesture recognition tasks,a hand attention mechanism is designed.The hand attention mechanism is including two branches.One branch adds hand features to global features through a local hand attention module to increase the weight of hand features;the other branch uses only hand information to reduce redundant information Impact.By combining the global dual attention mechanism and the local hand attention mechanism,a GLA structure is designed.The GLA structure promotes the distinctive expression of hand features and reduces the influence of redundant information on the recognition accuracy.The use of global and local attention mechanisms can not only allocate limited computing resources to key feature calculations,but also produce results that are more in line with the requirements of human visual cognition.(4)Aiming at the singularity of single modal data feature expression,using the complementary characteristics of different modal data,such as RGB data has richer texture information,and depth data can better express the distance information between things and imaging equipment.This thesis takes RGB and depth modal gesture video data as input,and proposes a two-way fusion gesture recognition algorithm based on GLA structure and I3 D network — — GLA?2WI3D algorithm.The difference information in the two modalities is merged through Element-wise product fusion to enrich the diversity of characteristics.GLA?2WI3D uses the attention mechanism to promote the expression of hand information,and enriches the features related to gestures through two-way fusion and multiple angles.The GLA?2WI3D algorithm avoids the influence of irrelevant factors such as background and environment,and improves the accuracy of recognition.The accuracy of recognition in the IsoGD dataset in this thesis reaches 73.50%.
Keywords/Search Tags:Gesture recognition, Attention mechanism, I3D, Multi-modal, Two-way fusion
PDF Full Text Request
Related items