Human action recognition tasks have a significant place in the field of computer vision.The use of the skeleton data for research is the main trend as the skeleton data has more advantages compared to other modal data.From the general direction for behavior recognition research methods are divided into the classical traditional methods and the mainstream deep learning methods in recent years.Behavior recognition based on deep learning methods is a hot topic of current research,which is divided into three research directions: convolutional neural network,recurrent neural network and graph convolutional neural network.Since the human skeleton is a natural graph topology,which is well suited for the use of graph convolution methods,human skeleton behavior recognition based on graph convolution has been an ongoing research topic for researchers.Firstly,this paper gives a detailed introduction to the principle of graph convolution and the research of graph convolution on human behavior recognition tasks,and then proposes the structure of SMT-DGCN and LGFS-GCN models respectively in combination with the disadvantages of existing networks.In this paper,feature reconstruction is mainly used to improve the feature extraction method of ST-GCN,and the two models address the two graph convolution defects of improving the process of graph convolution in which node features tend to spread with the deepening of layers and the lack of ability to construct global features,respectively.Finally,the advancedness of the network is proved by sufficient experiments on large datasets.In summary,this paper has three main contributions as follows.(1)In this paper,a spatio-temporal feature extraction network(SMT-DGCN)incorporating dense connectivity is proposed to improve the problem that node features tend to spread with deeper layers in the graph convolution process by fully connecting all layer features through the dense connectivity mechanism.The dense connectivity mechanism is introduced into the STGCN network to fully utilize the features of each layer in spatial modeling,and the multitemporal temporal convolution is used to extract rich motion features in temporal modeling.This network structure has two advantages,firstly,it can reuse the features and improve the feature utilization through the dense connectivity mechanism.Secondly,the temporal motion features are enriched by using multi-temporal temporal convolution module to refine the temporal features.Finally,the advanced recognition effect of the model is demonstrated experimentally.(2)In this paper,we propose a graph convolution model LGFS-GCN with separation of local and global features,which improves the problem of insufficient ability of traditional graph convolution to construct global features by separating the construction of local node features and global node features.The local features of this model are constructed by the original GCN network,and the global features are constructed directly by the dynamic transformer encoding module DTEM encoding mapping the original coordinate vector.This feature construction method enables the network to automatically learn local features and global features that are beneficial to the fixed graph structure.Finally,through experimental analysis of the LGFS-GCN model,it is demonstrated that this way of fine-graining the features for classification feature extraction achieves advanced performance.(3)In this paper,we make many improvements to the model while fusing the dense connectivity mechanism and the self-attentive mechanism,such as MTCN adaptation replacing TCN,Sigmoid adaptation replacing Softmax,and the selection of Concat fusion strategy to improve the recognition performance of the network in terms of details.Extensive experiments with SMT-DGCN and LGFS-GCN on large behavior recognition datasets NTU-RGBD60,NTURGBD120,and Kinetics400 demonstrate that ST-GCN networks incorporating dense connectivity and self-attentiveness mechanisms can improve the inherent shortcomings of traditional graph convolution models where node features tend to dissipate with deeper layers and global feature extraction The ST-GCN network is able to improve the inherent shortcomings of traditional graph convolution models,such as the tendency of node features to dissipate as layers deepen and the lack of global feature extraction capability. |