Font Size: a A A

Research On Spatiotemporal Feature Enhancement For Skeleton-based Human Action Recognition

Posted on:2024-04-05Degree:MasterType:Thesis
Country:ChinaCandidate:R X QingFull Text:PDF
GTID:2568307127454164Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Human action recognition is the research focus of computer vision,and is the basis for human motion prediction and action localization.With the development of intelligent machines,medicine and video surveillance,human action recognition technology has important practical application value in the fields of human-computer interaction,monitoring security and video understanding.Because skeleton data has better robustness and lighter weight in the face of complex background scenes or changes in motion angle than RGB video data,human action recognition technology based on skeleton data has been rapidly developed in recent years.With the development of deep learning technology,in view of the good performance of graph convolutional networks in processing non-Euclidean data,more and more scholars have begun to apply graph convolutional networks to explore in the study of human action recognition based on skeleton data.In the human action recognition task based on skeleton data,extracting discriminative spatiotemporal features is the key to recognition.However,the existing skeleton action recognition methods with graph convolutional networks as the baseline network still face problems such as insufficient spatiotemporal dependent feature extraction,insufficient learning of potential relationships between features,and insufficient discriminative features of the limb parts that complete the action.In view of the above problems,this paper mainly studies the aspects of enhancing spatiotemporal dependent features,learning potential feature relationships,and hierarchical reinforcement learning on skeleton data.The main contents and results of this paper are as follows:(1)In this paper,a skeleton action recognition method based on Multi-Granularity SpatioTemporal Encoder(MG-STE)is proposed.First,in the spatial domain,this paper proposes a Multi-Granularity Spatial Encoder(MG-SE)module,which divides the feature vectors containing different joint granularities in the joint dimension,and these features contain all the time information of the action in these joint granularities.Secondly,in the time domain,this paper proposes a Multi-Granularity Temporal Encoder(MG-TE)module,which divides multiple granular features of different continuous time lengths in the time dimension,and these continuous time fragments contain the spatial information of all joints.Then,this paper proposes a Two-stream Multi-Granularity Spatio-Temporal Encoder Graph Convolutional Network(2s-MG-STEGCN)based on Multi-Granularity Spatio-Temporal Encoder,and the final prediction result is obtained by fractionally weighted fusion of individual prediction scores for joint flow and bone flow.Finally,experiments are carried out on NTU-RGBD 60 and Kinetics-Skeleton 400 datasets,and the results verify the effectiveness of the proposed method.(2)In this paper,a human skeleton action recognition method based on Feature Difference and Feature Correlation Learning Mechanism(FDCL-GCN)is proposed.Firstly,the Temporal Feature Difference and Correlation Learning(TFDCL)module is proposed to learn the feature correlation between related parts in adjacent time frames,and the feature differences are captured by the changes in the action of joints on the entire long-term timeline.Secondly,the Channel Feature Difference and Correlation Learning(CFDCL)module is proposed,which uses independent convolution kernels to interact with different channels to obtain more complex feature maps to highlight key joints with high influence in the whole movement.Then,considering that all joints are involved in maintaining motor progression and body balance,the Temporal Channel Context Topology(TCCT)module is proposed to dynamically learn the context topology to enhance global features.Finally,in the experimental stage,experiments are carried out on NTU-RGBD 60 and Kinetics-Skeleton 400 datasets,and the results verify the competitiveness of the proposed method.(3)In this paper,a Hierarchical Learning Strategies and Short-term Motion Enhancement(HLS-SME)method based on hierarchical learning and short-term motor enhancement is proposed.First,Hierarchical Learning Strategies(HLS)is proposed to perform hierarchical learning on skeleton data.At the same time,considering that the training cost of independent streams using multiple modal data is large and the feature information cannot be shared between each modal,unified multimodal data processing is carried out first to share global coordinates.They are then fed into their respective network model pipelines for training,so that knowledge can be shared between different modal data.Secondly,a Short-term Motion Expansion(SME)module is proposed to enhance short-term motion characteristics.Finally,in the experimental stage,the three large public datasets of NTU-RGBD 60,NTU-RGBD 120 and Kinetics-Skeleton 400 are verified,and the results show the competitiveness and effectiveness of the proposed method.In summary,this paper takes the graph convolutional network as the basic framework,carries out research on human action recognition based on skeleton data,proposes three skeleton action recognition methods,and experiments are carried out on multiple public general skeleton datasets,which proves the good performance of the proposed algorithm and proves that the research in this paper has theoretical value and practical application value.
Keywords/Search Tags:Skeleton action recognition, graph convolutional networks, feature enhancement, spatio-temporal feature learning, hierarchical learning
PDF Full Text Request
Related items