In recent years,pattern recognition based on human motion and deep learning algorithm has become an important branch in the field of computer vision,and it has great application value in the fields of human-computer interaction,security monitoring and so on.On the other hand,with the advent of advanced motion capture devices,the depth camera can directly output the key 3D skeleton points of the human body,and this simple and robust human representation is widely used.Based on the above facts,the research takes the human 3D skeleton sequence as the data format,and regards gait information and action information as the human motion carriers,to explore person re-identification and action recognition.This paper is mainly summarized as follows:(1)Multi-Level Graph Encoding with Structural-Collaborative Relation Learning for Skeleton-Based Person Re-Identification.To fully explore body relations,we construct graphs to model human skeletons from different levels,and for the first time propose a Multi-level Graph encoding approach with Structural-Collaborative Relation learning(MG-SCR)to encode discriminative graph features for person Re-ID.Specifically,considering that structurally-connected body components are highly correlated in a skeleton,we first propose a multi-head structural relation layer to learn different relations of neighbor body-component nodes in graphs,which helps aggregate key correlative features for effective node representations.Second,inspired by the fact that body-component collaboration in walking usually carries recognizable patterns,we propose a cross-level collaborative relation layer to infer collaboration between different level components,so as to capture more discriminative skeleton graph features.Finally,to enhance graph dynamics encoding,we propose a novel self-supervised sparse sequential prediction task for model pre-training,which facilitates encoding high-level graph semantics for person Re-ID.In addition,we demonstrate the effectiveness of the multi-head structural relation layer and the crosslevel collaborative relation layer through formula derivation.(2)Attention based Multi-level Co-occurrence Graph Convolutional LSTM for 3D Action Recognition.Standard Long Short-Term Memory(LSTM)based models are unable to fully model the relationship between different body joints or persons to extract crucial cooccurrence features from different levels.Thus,we design a core module called MultiLevel Co-Occurrence Graph Convolutional LSTM,which creates multi-level co-occurrence(MC)memory units coupled with GCN to automatically model the spatial relationship between joints,and simultaneously capture the co-occurrence features from different joints,persons,and frames.In addition,we design a spatial attention module for enhancing the features of key joints from the skeleton sequence input and feed them to the core module.Last,we construct aggregated features of multi-level co-occurrences(AFMC)from MC memory units to better represent the intra-frame action context encoding,and leverage a concurrent LSTM(Co-LSTM)to further model their temporal dynamics for action recognition.(3)Augmented Skeleton Based Contrastive Action Learning with Momentum LSTM for Unsupervised Action Recognition.We for the first time propose a contrastive action learning paradigm named ASCAL that exploits different augmentations of unlabeled skeleton sequences to learn action representations in an unsupervised manner.Specifically,we first propose to contrast similarity between augmented instances of the input skeleton sequence,which are transformed with multiple novel augmentation strategies,to learn inherent action patterns(“pattern invariance”)in different skeleton transformations.Second,to encourage learning the pattern-invariance with more consistent action representations,we propose a momentum LSTM,which is implemented as the momentum-based moving average of LSTM based query encoder,to encode long-term action dynamics of the key sequence.Third,we introduce a queue to store the encoded keys,which allows flexibly reusing proceeding keys to build a consistent dictionary to facilitate contrastive learning.Last,we propose a novel representation named Contrastive Action Encoding(CAE)to represent human’s action effectively.(4)Prototypical Contrast and Reverse Prediction:Unsupervised Skeleton Based Action Recognition.Different from plain motion prediction,PCRP performs reverse motion prediction based on encoder-decoder structure to extract more discriminative temporal pattern,and derives action prototypes by clustering to explore the inherent action similarity within the action encoding.Specifically,we regard action prototypes as latent variables and formulate PCRP as an expectation-maximization(EM)task.PCRP iteratively runs(1)E-step as to determine the distribution of action prototypes by clustering action encoding from the encoder while estimating concentration around prototypes,and(2)M-step as optimizing the model by minimizing the proposed ProtoMAE loss,which helps simultaneously pull the action encoding closer to its assigned prototype by contrastive learning and perform reverse motion prediction task.Besides,the sorting can also serve as a temporal task similar as reverse prediction in the proposed framework.With the above four models,this paper conducts a large number of experiments on different data sets,which verifies their effectiveness and provides a valuable reference for researches and applications of similar human motion pattern recognition. |