| Action recognition is a fundamental research direction in computer vision,which has wide applications in intelligent monitoring,medical care,sign language translation,human-computer interaction,virtual reality,and other fields.The skeleton is a compact representation of the state of the human body,so skeleton-based methods are a core branch of the action recognition area.It has a strong action description ability without background noise interference and requires a small computation.Methods based on deep learning are the mainstream method of skeleton action recognition.Specifically,methods based on graph convolutional network(GCN),which fully express the natural structural information,have made performance breakthroughs.However,those methods still have some defects.This paper analyzes the weaknesses of current methods and solves problems that limit their performance.The sizes of the human body area involved in various actions are different and cover many levels of receptive fields.Therefore,this paper constructs a new model named MSGCN,which builds action descriptions at different semantic levels and effectively exploits the prior knowledge of skeleton hierarchy.It also improves the efficiency of the network to explore large-scale spatiotemporal receptive field information and enhances the ability to body range sizes of different actions.Experimental data show that the MSGCN achieves more accurate recognition accuracy than previous methods while reducing the number of network layers by half,with fewer parameters and faster inference speed.The existing GCN-based methods have insufficient representation ability,and the redundant connections between nodes interfere with important relationship modeling.In this paper,the RS-GCN is constructed to filter those connections in a learnable way,avoiding their interference in spatiotemporal relations modeling.It has more flexibility in graph topology modeling,more strong action descriptive power,and more outstanding context extraction.The RS-GCN constructs sparse and effective graph typologies in experiments and significantly improves action recognition accuracy.The existing GCNs have incomplete relationship modeling in the spatiotemporal domain.What’s more,their shared graph structure among channels leads to limited feature processing capabilities.The paper constructs a new model named Hybrid Net,which integrates the graph convolutional neural network and the convolutional neural network.It completely models node relations in local spatiotemporal domains,avoids sharing graph structures between channels,and transforms graph structure feature maps into grid-like feature maps while retaining structural information.The experimental results show that the Hybrid Net has higher recognition accuracy than graph convolutional and convolutional neural networks,especially in problematic action classes.The paper profoundly studies the shortcomings of existing GCN-based skeleton action recognition methods in multi-level semantic modeling,spatial graph construction,and spatiotemporal relation modeling,then proposes some new techniques for solving these problems.In general,these methods are improvements and innovations on existing graph convolution methods from different perspectives based on the understanding of the skeletal action recognition task.Experiments are conducted in datasets like NTURGB+D,NTU-RGB+D 120,Kinetics,Northwestern UCLA,etc.The experiments verify the validity of these methods,and encouraging results have been obtained. |