Research On Two-way Fusion Gesture Recognition Algorithm Based On Attention Mechanism

Posted on:2022-06-20

Degree:Master

Type:Thesis

Country:China

Candidate:H J Fang

Full Text:PDF

GTID:2518306605968099

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

The traditional way of human-computer interaction is in a contact way such as mouse,keyboard,touch screen,etc.With the development of science and technology,the traditional way of human-computer interaction has been unable to meet the increasing interaction needs of people.Gestures,as a common way for humans to convey information and semantics,are of great significance to the application of human-computer interaction.Therefore,the development of gesture recognition technology promotes innovation in the field of human-computer interaction,and provides a more convenient,efficient,and humanized interaction method for human production and life.By investigating the background and practical significance of the dynamic isolated gesture recognition task,the human visual attention mechanism derived from cognitive psychology and neuroscience has been studied,the related algorithms of gesture recognition have been investigated,and the attention mechanism based on three-dimensional convolutional neural network and attention has been proposed.The two-way fusion network algorithm(GLA?2WI3D).The GLA?2WI3D algorithm introduces the GLA(Global and Local Attention)structure that combines local and global attention to increase the weight of hand features,thereby improving the accuracy of gesture recognition.The main work of the thesis is as follows:(1)Aiming at the timing dependence of dynamic gesture recognition tasks,an extended three-dimensional convolutional neural network is used to extract spatial and temporal features at the same time..The network uses Inception module as the main core which a dense structure instead of the sparsely connection,and combines a three-dimensional convolution structure.The sparse and dense structure in the network uses a wider network layer structure to obtain more comprehensive gesture features,which is beneficial to improve the accuracy of dynamics gesture recognition task.(2)Aiming at the local receptive field characteristics of convolutional neural networks,the acquisition of the global receptive field depends on the multi-layer superimposed convolution structure.It is difficult for long-distance features to be correlated,and it is difficult for effective features to promote each other.In response to this phenomenon,this thesis designs dual attention modules that can obtain contextual information,including Temporal-spatial attention modules and spatiotemporal channel attention modules.The Temporal-spatial attention module enables the features to promote each other in the spatiotemporal dimension,and the spatiotemporal channel attention module enables neurons to effectively express the temporal and spatial features,increasing the proportion of effective neurons.(3)Aiming at the problem of redundant information such as background,lighting and environment in dynamic gesture recognition tasks,a hand attention mechanism is designed.The hand attention mechanism is including two branches.One branch adds hand features to global features through a local hand attention module to increase the weight of hand features;the other branch uses only hand information to reduce redundant information Impact.By combining the global dual attention mechanism and the local hand attention mechanism,a GLA structure is designed.The GLA structure promotes the distinctive expression of hand features and reduces the influence of redundant information on the recognition accuracy.The use of global and local attention mechanisms can not only allocate limited computing resources to key feature calculations,but also produce results that are more in line with the requirements of human visual cognition.(4)Aiming at the singularity of single modal data feature expression,using the complementary characteristics of different modal data,such as RGB data has richer texture information,and depth data can better express the distance information between things and imaging equipment.This thesis takes RGB and depth modal gesture video data as input,and proposes a two-way fusion gesture recognition algorithm based on GLA structure and I3 D network � � GLA?2WI3D algorithm.The difference information in the two modalities is merged through Element-wise product fusion to enrich the diversity of characteristics.GLA?2WI3D uses the attention mechanism to promote the expression of hand information,and enriches the features related to gestures through two-way fusion and multiple angles.The GLA?2WI3D algorithm avoids the influence of irrelevant factors such as background and environment,and improves the accuracy of recognition.The accuracy of recognition in the IsoGD dataset in this thesis reaches 73.50%.

Keywords/Search Tags:

Gesture recognition, Attention mechanism, I3D, Multi-modal, Two-way fusion

PDF Full Text Request

Related items

1	Research And Implementation Of Gesture Recognition System In Natural Scenes
2	Gesture Recognition Based On Multi-modal Fusion Of RGB-D Images
3	Research Of Emotion Recognition Based On Multi-modal Fusion
4	Research On Multimodal Gesture Location And Recognition Based On Attention Mechanism
5	Research On Key Technologies Of Gesture Recognition Based On Multi-modal Fusion
6	Research On Speech Emotion Recognition Method Based On Multi-feature And Multi-modal Fusion
7	Multi-modal Speech Emotion Recognition Based On The Attention Mechanism
8	Research Of Gesture Recognition Based On Computer Vision
9	Design And Implementation Of Gesture Recognition Method For Wearable Devices Based On Cross-modal Deep Learning
10	A Study Of Deep Learning Based Multimodal Emotion Recognition