Font Size: a A A

Human Skeleton Action Recognition Based On Spatiotemporal Graph Attention Convolution Network

Posted on:2021-12-03Degree:MasterType:Thesis
Country:ChinaCandidate:Q H MaFull Text:PDF
GTID:2518306050470724Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Human behavior recognition task can be applied in many fields,such as indoor surveillance,patient monitoring systems,human-computer interaction,virtual reality,smart home,smart security systems,assisted athlete training,etc.In a video containing motion,human bones are the main carriers of motion information.On a sequence of video frames,using skeletons to record motion data has the advantage of small storage capacity and not affected by light and illumination.Therefore,skeleton-based video behavior recognition tasks are a popular direction for researchers.In a skeleton graph,each joint point can be regarded as vertex of a graph structure,and skeleton between joints can be regarded as edges.Graph neural networks as a feature extraction algorithm aiming at graph data has achieved great performance.Aiming at the data of skeletons in the extracted video data sets,the graph convolutional network can simultaneously capture all the spatiotemporal interpolations of the joint points,which is an aggregation of the joint point features in a whole.According to the temporal change of the feature vector of each pixel in the video frames,time convolution is added to aggregate state value of each joint.To ensure robustness of the model,a graph self-attention mechanism is added between different layers,and finally a spatiotemporal graph convolution Network ST-ChebNet is formed,which is an effective method for processing video skeleton action recognition.According to the above description,there are three points in this thesis.The first point is to use the Chebyshev convolutional network to extract the temporal and spatial features of the human skeleton in the video.In order to easily distinguish the features of adjacent nodes,this thesis first distinguishes different kinds of the neighboring nodes of each joint point by distinguishing centrifugal motion and centripetal motion,that is,subgraph division of the adjacency matrix in the skeleton graph.Then,a more stable symmetric normalization method is used to transform the adjacency matrix to obtain the corresponding Laplacian matrix.In order to distinguish the neighbor nodes of different distances and obtain the non-Euclidean correlation between the nodes,a spatiotemporal-ChebNet based on Chebyshev convolution is designed to extract the features of nodes of the input skeleton graph in the time dimensions,and finally send the feature aggregation to full convolutional networks get video predictions.Second,in order to effectively extract the changes in the feature vector of the skeleton joint points in the time dimension,a time convolution module is constructed and cascaded with the spatiotemporal Chebyshev convolution.Use the standard residual network to avoid the overfitting or gradient disappearance and obtain the final version spatiotemporal graph convolutional network ST-ChebNet designed for video skeleton sequence.Third,subgraph division of the adjacency matrix will result in mismatched graph structures during training and testing.In view of this problem,a graph attention module is designed.This module based on the self-attention mechanism re-aggregates the output features of the spatiotemporal graph convolutional network of each layer,so that the graph signal of each layer no longer depends on the specific graph structure,allowing the training data set and test data set match effectively.This module is cascaded with the spatiotemporal graph convolution module to obtain the final detection model ST-ChebANet.The introduction of the self-attention mechanism effectively improves the accuracy of the entire spatiotemporal graph convolutional network on the video skeleton behavior recognition data set.Among the above three work points of this thesis,the proposed Spatiotemporal-ChebNet,ST-ChebNet,and ST-ChebANet were evaluated by the X-Sub and X-View methods on the data set NTURGB+D.At the same time,the Kinetics data set was processed through openpose and the obtained skeleton data is verified,and the accuracy of the model compare with the existing algorithms is improved.The experimental results prove the rationality of the model.
Keywords/Search Tags:Video Action Recognition, Graph ChebConv Convolution, Temporal Convolutional Network, Graph Attention Network
PDF Full Text Request
Related items