| Human action recognition is an active research area in computer vision,since it plays a critical role in video understanding and has important applications in many areas,such as video surveillance,human-machine interaction,and virtual reality.Recently,graph convolutional networks(GCNs)have achieved state-of-the-art results for skeleton based action recognition by expanding convolution neural networks(CNNs)to graphs.However,due to the fixed receptive fields of graph convolution kernels in all layers,existing GCN-based methods only learn local information among adjacent joints and are hard to obtain high-level interaction features,such as interactions between five parts of human body.Moreover,subtle differences of confusing actions often hide in specific channels of key joints' features,but this kind of discriminative information is rarely exploited in existing methods.Based on the analysis of the advantages and disadvantages of the existing action recognition models,this paper innovatively designs structure based graph pooling scheme(SGP),a more lightweight model and joint-wise channel attention module(JCA).The main contents and innovations of this paper include:(1)We propose a novel structure based graph pooling scheme(SGP),a three-step pooling scheme based on human movement patterns.The SGP scheme gradually expands receptive fields without destroying topological structure of graphs.Human body can be decomposed into five regions,two arms,two legs,and one trunk.Furthermore,it can be roughly divided into upper body and lower body.(2)We propose a more lightweight model,the model only with four graph convolutional layers,fewer parameters and less computation.Using SGP scheme,our new network will iteratively learn hierarchical representations of skeleton based graph.Our method achieves competitive performance with fewer graph convolutional layers thanks to the information transfer acceleration brought by SGP scheme.The SGP scheme also brings a reduction in the number of parameters and computational cost significantly,which makes our model lightweight.(3)In addition,we propose joint-wise channel attention(JCA)module,which fuses local and global information of human action.It can mine subtle differences among similar human actions,by applying different attention to the different channel of each joint.The information hidden in discriminative channels of key joints can be enhanced by our JCA module and the joints or channels containing redundant information are neglected.Therefore,our method can effectively extract local nuances features and classify confusing actions.We evaluate our SGP scheme and JCA module on three most challenging skeleton based action recognition datasets: NTU-RGB+D,Kinetics-M,and SYSU-3D.Our method outperforms the state-of-art methods on three benchmarks. |