Font Size: a A A

Research On Graph Convolution Neural Network Based On Multi-attention Mechanism For Human Action Recognition

Posted on:2022-06-07Degree:MasterType:Thesis
Country:ChinaCandidate:X Y LiFull Text:PDF
GTID:2518306314974269Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Human action recognition has attracted more and more attention in recent years.As an important and popular research direction in computer vision,it usually involves many disciplines,such as image processing,machine learning and artificial intelligence.And it has a wide range of applications in human-computer interaction,medical assistance,virtual reality,even game entertainment and intelligent video surveillance,which not only has far-reaching theoretical research significance,but also has important practical application value.With the emergence and popularity of depth camera devices with the advantages of low cost and easy access(for example,Kinect released by Microsoft),action recognition based on 3D skeleton data has become a new research direction,and further become a research hotspot.Due to the robustness of human skeleton data and the insensitivity to the changes of light conditions,the performance of the action recognition method based on 3D skeleton data is better than other methods in the recognition accuracy given the precise joint coordinates.In the past two years,many scholars have fully integrated data modeling and graph structure according to the characteristics of human skeleton data,and applied graph convolution neural network to the field of human action recognition based on skeleton,and achieved remarkable results.However,this kind of research methods usually set and use a fixed topology according to the characteristics of human skeleton,which limits the scope of message propagation between nodes,and is difficult to fully receive and transfer the data information of nodes in the global scope,and is not suitable for modeling the diverse action samples.Moreover,these methods ignore the importance of the channel dimension,which makes the feature extraction process unable to focus on more important channel features and ignore the unimportant channel information.In addition,these methods usually use the directly obtained data of joint coordinates,which does not fully consider the intra frame bone length and inter frame dynamic information,and lack of joint use of various data,resulting in the recognition results still have room for improvement.To solve the above problems,this paper proposes a multi-attention spatiotemporal graph convolution network,which uses attention mechanism in space and channel dimensions to improve the accuracy of action recognition based on graph convolution neural network;And to further improve it,the introduction of time attention mechanism,while making full use of intra and inter frame data,forming a multi stream integrated network,to achieve high-precision human action recognition.A large number of experiments have been carried out on NTU-RGBD and Kinetics datasets,which are recognized as challenging large datasets,and high-precision experimental results have been obtained.The innovation of this paper is summarized as follows:In order to make the graph convolution neural network model better learn a variety of action samples and dynamically adjust the graph structure,this paper proposes a graph attention module.It consists of two parts,one is the data-driven graph matrix,which is initialized according to the human physical structure and time information in the skeleton sequence,and then dynamically optimizes the connected graph in the process of neural network training,and adjusts the weight of edges to obtain better skeleton representation.The second part is the graph attention matrix,which uses the typical attention mechanism of computer vision to calculate the similarity between any two joint points to form the weight of the edge,and learn from a variety of action samples to obtain better action expression ability,and participate in the adjustment of graph structure.Considering that in the process of feature extraction,the redundant features will affect the final classification accuracy,this paper proposes to use channel attention module.In the process of network training,the attention mechanism is used for the channels with different semantics to learn the weight of the channels,so that the network model can focus on more important feature information,help to extract more relevant features and eliminate the influence of some redundant features.In order to improve the accuracy of human action recognition method based on skeleton,the skeleton coordinate data is diversified to form skeleton data with intra frame bone length and inter frame dynamic information,and the idea of multi stream integration is adopted to send all kinds of data into the network for learning,and integrate the recognition results of multiple networks,Finally,it can improve the accuracy of the whole model.
Keywords/Search Tags:Action Recognition, 3D Human Skeleton, Graph Convolution Network, Attention Mechanism
PDF Full Text Request
Related items