Human action recognition is an important and challenging task in the field of computer vision,and skeleton-based human action recognition has attracted extensive attention in this field due to the robustness and availability of human skeleton data.With the development of deep learning,there is a trend of using graph convolutional networks to model human skeleton into spatio-temporal graphs to explore the inner connections of human joints has achieved remarkable performance.However,existing methods always ignore the long-range dependencies between joints,fixed temporal convolution kernels lead to inflexibility in temporal modeling,and existing models are often over-parameterized,increasing the computational cost To solve these problems,this thesis proposes an improved action recognition model based on the graph convolutional network and human skeleton data.The main contents are as follows:(1)For ignoring long-range dependencies between joints and lack of flexibility in temporal modeling,we propose a multi-scale adaptive aggregated graph convolutional network for skeleton-based action recognition in the thesis.Firstly,a multi-scale spatial graph convolution is designed to aggregate long-range dependencies and multi-order semantic information of skeletal data,and comprehensively model the relationships of human joints for feature learning.Then,a multi-scale temporal convolution module is proposed to adaptively select convolution kernels with different temporal lengths to obtain a more flexible temporal map.In addition,a spatio-temporal-channel attention module is added to obtain more meaningful joint,frame and channel information in the skeleton sequence.Finally,residual connections are introduced between modules to reuse feature information.Through experiments on three large-scale public datasets(NTU RGB+D 60,NTU RGB+D 120,and Kinetics-Skeleton),the results demonstrate the superiority of the proposed model.(2)For the problem that the model is too complex and the number of parameters is too large,we construct a lightweight multi-scale spatio-temporal graph convolutional network.Firstly,the original multi-scale spatial graph convolution module is improved by a hierarchical strategy,and the dilated convolution is introduced into the temporal convolution module to obtain a wider effective receptive field without changing the size of the convolution kernel.Then,depthwise separable convolutions are also used instead of standard convolutions to reduce the number of parameters and improve the training speed of the model.Moreover,a spatio-temporal position attention module is proposed to discover the most informative joints in a specific frame in the entire skeleton sequence,thereby enhancing the model’s ability to extract discriminative features in different action sequences.Finally,the multi-stream data fusion method is adopted to increase the input data and expand the feature information in the network.Through extensive ablation and comparison experiments,the model proposed in the thesis can achieve better recognition accuracy with a lower number of parameters,which proves the superiority of the model. |