| With the rapid development of Internet technology and the fifth generation mobile communication technology,video service has become the main form of network media.Automatic analysis and understanding of video service can ensure the healthy development of network media.As an important branch of video understanding,human action recognition technology has a wide range of application prospects in medical health,intelligent video surveillance and human-computer interaction,and is one of the hot topics in computer vision research.In recent years,with the continuous progress of posture recognition algorithms and depth cameras,skeleton-based human action recognition algorithms have been developed rapidly.Compared with the recognition algorithms based on RGB video,skeleton-based human action recognition algorithms have higher accuracy and smaller calculation.In this paper,we use graph convolutional networks to extract spatio-temporal features from skeleton sequence data to classify human actions.The main research contents of this paper are as follows:This paper proposes a cross-spatio-temporal graph convolution algorithm for human action recognition combined with dynamic encoding.The algorithm constructs a static graph topology with spatial local information according to the human skeleton structure in the frame,and designs a dynamic encoding module to aggregate the context information of other joints in the frame to construct a dynamic graph topology with spatial global information.The two-stream graph convolutional network is used to extract spatial features from the two graph topologies.The correlation between different joints is calculated by temporal extension module between frames to form an inter-fiame graph topology with adaptive edge weight.The spatio-temporal cross convolutional network is used to extract local features across time and space based on this graph topology,so as to avoid information propagation obstruction in the local spatio-temporal neighborhood.High-dimensional spatio-temporal features are obtained by concatenated stacked multi-layer two-stream graph convolutional network and spatial temporal cross convolutional network,and use the linear layer to judge the human action category.Experimental results show that the cross-spatio-temporal graph convolution network for human action recognition combined with dynamic encoding reaches higher accuracy than the current mainstream action recognition algorithms.This paper proposes a two-person action recognition algorithm based on spatio-temporal graph convolution with interaction relationship.The algorithm regards the two objects with interactive behavior as a whole in the frame,establishes the two-person adjacency matrix according to the motion relationship of the intra-body and inter-body joints,and uses the interactive relational attention encoding module to excavate the potential interactive attention.After stacking the interactive attention for the two-person adjacency matrix,the interactive relational graph convolutional network is used to extract the spatial interaction features.Then,the correlation between the two objects is established in the inter-frame to enhance the cooperativeness of the two people in the temporal dimension,and the interactive relational temporal convolutional network is used to extract the temporal interaction features from the inter-frame correlation.In order to further activate data at multiple moments,a multi-scale model is introduced into the interactive relational temporal convolutional network.Finally,the multi-branch network architecture is used,and the class activation mapping is introduced between the branches of the network to mine the useful information of all active joints.The experimental results show that the proposed algorithm has higher accuracy than the current mainstream two-person interaction behavior recognition algorithms and general behavior recognition algorithms,and effectively reduces the interference of non-standard skeleton data to the algorithm. |