Research On Human Skeleton Action Recognition Based On Graph Convolutional Networks

Posted on:2024-06-23

Degree:Master

Type:Thesis

Country:China

Candidate:J L Hu

Full Text:PDF

GTID:2568307124963759

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Action recognition has been a research hotspot in the field of computer vision due to its wide applications in human-computer interaction,intelligent surveillance,and video understanding.Compared with RGB video,the skeleton has better robustness to illumination,environmental changes,and changes in viewpoint to the camera,while the graph convolutional network can model the human skeleton topology more effectively.As a result,graph convolutional network-based human skeleton action recognition has drawn significant attention from academics.Although existing studies have achieved certain achievements,there are still problems such as insufficient exploration of the interaction between non-physically connected joints,a high cost of model inference,and difficulty in distinguishing different actions with similar motion trajectories.To address the above problems,three graph convolutional network models suitable for skeletons are proposed in this thesis.The following are the primary research contents:(1)An adaptive activation graph convolutional network is proposed to explore the interactions between non-physically connected joints more effectively and to reduce temporal redundancy information.Firstly,the similarity between joints in the embedding space is calculated as the weights of node-connected edges to learn the skeleton space topology adaptively.Secondly,richer spatio-temporal feature information is extracted using class activation maps and multi-stream network architectures.Finally,a temporal feature aggregation module is introduced in the network to reduce temporal redundancy by using dilated convolution jumps to aggregate frame-level features.The proposed method outperforms the classical two-stream adaptive graph convolutional network in two skeleton action recognition datasets,NTU RGB+D and NTU RGB+D 120,where the recognition accuracy of the adaptive activation graph convolutional network reaches 88.9%and 94.5% under the Cross-Subject and Cross-View of the NTU RGB+D dataset,respectively.The experimental results show that the proposed method is an effective method for human skeleton action recognition.(2)In order to effectively use the spatio-temporal semantic information latent in the human skeleton sequence and extract multi-scale features,a semantically guided multiscale neural network is proposed.Secondly,to enhance the representation of human motion features,joint type semantic information and frame index semantic information are respectively embedded into the spatio-temporal dimension modeling.Secondly,based on the original human skeleton structure,the multi-scale skeleton information that maintains dependence on the original skeleton is obtained by aggregating neighboring joints,and it is modeled by an adaptive graph convolution network to extract spatial multiscale features.Finally,by grouping the neurons of the temporal convolution network,the multi-scale temporal convolution network is constructed by using dilated convolution with different dilation rates to extract the temporal multi-scale features.Experiments are conducted on two skeleton action recognition datasets,NTU RGB+D and NTU RGB+D120,in which the semantically guided multiscale neural network achieves recognition accuracy of 90.1% and 95.8% with only 0.93 M parameters in the Cross-Subject and Cross-View of the NTU RGB+D dataset,respectively.The results show that the network model improves recognition accuracy while reducing computational costs.(3)In order to solve the problem that the model makes it difficult to distinguish different actions with similar motion trajectories,a topologically refined graph convolutional network based on multi-order features is proposed.Firstly,the angles formed between joints during human motion are unique,the angular features are encoded into the joint,bone,and motion information in order to improve the model’s ability to distinguish different actions with similar motion trajectories without adding additional training costs.Secondly,the joint,skeleton,and motion information after embedding the angular features are modeled by topologically refined graph convolutional networks to extract complementary spatio-temporal features,respectively.Finally,the network is designed with a spatio-temporal information sliding extraction module for enhancing the correlation of spatio-temporal higher-order feature information.The multi-stream network consisting of joint branch,skeleton branch,and motion information branch is experimented on three skeleton action recognition datasets,NTU RGB+D,NTU RGB+D120,and Northwestern-UCLA,in which the recognition accuracy reaches 92.8% and 97.0%under the Cross-Subject and Cross-View of the NTU RGB+D dataset,respectively.The superiority of the method is demonstrated through the experimental results.

Keywords/Search Tags:

action recognition, human skeleton, graph convolutional network, spatio-temporal semantic information, multi-scale features

PDF Full Text Request

Related items

1	Research On Human Action Recognition Based On Skeleton Features
2	Research On Human Skeleton Action Recognition Method Based On Graph Convolutional Network
3	Action Recognition Of Human Skeleton Based On Spatio-temporal Graph Convolutional Neural Network
4	Research On Human Action Recognition Based On Spatio-temporal Graph Convolutional Neural Network
5	Action Recognition Method Based On Multi-frequency Spatio-temporal Feature Learning
6	Human Action Recognition Based On Spatio-temporal Graph Convolution Network
7	Research On Human Skeleton Action Recognition Based On Graph Convolutional Network
8	Human Action Recognition Based On Spatio-temporal Information
9	Research On Human Action Recognition Algorithm Based On Spatio-temporal Graph Convolutional Network
10	Human Skeleton Action Recognition Based On Deep Learning