| With the broadening of application scenarios for action recognition,skeleton based action recognition task has attracted more and more attention.The action sequence contains abundant temporal and spatial information,from which the discriminative information can be obtained.The key to solve action recognition task is to effectively use spatio-temporal feature learning.Because the segmentation of long action sequences and action labeling consume a lot of human and material resources,how to use unlabeled data to learn spatio-temporal features is a practical research field.In order to promote the application of action recognition in the safety monitoring and human-computer interaction scenarios,it is a promising research direction to study how to use the limited-observed action sequence to achieve action prediction.The contributions of this paper are as follows.Firstly,in order to effectively utilize the similarity information between actions contained in the spatio-temporal manifold trajectory curve,we proposed a continuous projection method to map the similarity of all action sequences,combined the manifold trajectory features into the deep neural network,and used graph convolution network for similarity learning.The effectiveness and superiority of our method have been proved on popular skeleton based action recognition datasets.Secondly,we analyzed the temporal and spatial information in skeleton based action recognition,and proposed temporal self-supervised method based on motion rate and spatial self-supervised method based on skeleton body parts matching.We used ensemble learning method to combine the two self-supervised methods.We have demonstrated the effectiveness of the spatio-temporal self-supervised method on two large skeleton based action recognition datasets.Finaly,for the partially observed action sequences,we introduced a spatio-temporal feature learning module with the perception of action completion rate,and explored the problem of action prediction and detection based on skeleton sequence.We used convolution network to learn from the observed action sequence,combined with graph convolution network to explore the temporal and spatial features.The experiment results on two common datasets have shown that our method can achieve better performance. |