Font Size: a A A

Research On Human Action Recognition Method Based On Adaptive Spatial-temporal Fusion Graph Convolutional Network

Posted on:2022-06-08Degree:MasterType:Thesis
Country:ChinaCandidate:S Y ZhengFull Text:PDF
GTID:2518306320475424Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of deep sensors and deep learning networks,human action recognition based on skeleton data is one of the hottest problems in the field of computer vision in recent years.Skeleton data obtained by sensors can represent the dynamic information of human joints and adapt to complex backgrounds with noise.The application of graph convolutional network to describe human skeleton to realize human action recognition can achieve good recognition results,but also has some problems,such as the topological structure of the graph is fixed during the realization process,will miss the joint correlation of non-physical connections,and cannot extract local spatial-temporal features and other issues.First of all,referring to the domestic and foreign literature on human action recognition based on skeleton data and graph convolutional network in recent years,the paper proposed a skeleton action recognition based on regional association adaptive graph convolutional network.Through adaptive graph convolution,the structure of the parameterized global graph and the single data graph and model convolution parameters were trained and updated in different layers,increased the flexibility of the graph structure in the model and the versatility of the model for various data samples,and added an attention mechanism to measure the connectivity and connection strength between physically connected joints.The paper introduced the regional association graph convolution,and the non-physical connection correlation of each joint between data frames was captured by alternating information transfer between joint features and connection features.Fusion of regional association graph convolution,adaptive graph convolution and temporal graph convolution to obtain a regional association adaptive graph convolutional network.And added the second-order data of the skeleton to supplement the original joint data,merged this two to form a two-stream network to improve the performance of the recognition network.Secondly,for the alternate training of spatial graph convolution and temporal graph convolution in the traditional graph convolutional network,which might lead to problems such as the inability to communicate information across time and space,the paper proposeed a human action recognition based on spatial-temporal fusion graph convolutional network,which introduced a expandable sliding time-space window through a spatial-temporal graph convolution operator to fuse the temporal information and spatial information between consecutive frames,using dense cross-spacetime edges as jump connections,so that the information used for the graph was directly transmitted across space-time to promote spatial-temporal feature learning.It promoted spatial-temporal feature learning.Then,by combining the spatial-temporal graph convolution operator with a new time graph convolution,a spatial-temporal fusion graph convolution network was proposed.Used spatial-temporal fusion path and factorization path to obtain local spatial-temporal features and long-term features,and then used the same joint and bone two-stream network framework to improve network performance.Finally,experiments on the NTU-RGBD and Kinect-skeleton datasets showed that the regional association graph convolution increased the network's potential dependence on non-physical connected joints,and the adaptive graph convolution strengthened the network's structural flexibility and data versatility,the recognition rate of a single human body movement had been improved to a certain extent by the regional association adaptive graph convolution.The spatial-temporal fusion graph convolution directly modeled the spatial-temporal dependency relationship from the skeleton graph sequence,fused the temporal graph convolution and the spatial graph convolution,extracted local spatial-temporal features,and could effectively identify the long actions synthesized by multiple actions.Both networks used a two-stream framework to improve classification performance.
Keywords/Search Tags:Graph Convolution, Human Action Recognition, Adaption, Regional Association, Spatial-Temporal Fusion, Two-Stream Network
PDF Full Text Request
Related items