| In recent years,human action recognition has become one of the most popular research topics in the field of computer vision.It is widely used in video surveillance,video games,human-computer interaction and other related fields.In the past decades,researchers have proposed a large number of action recognition approaches based on RGB video sequences and have achieved marvelous recognition results.However,traditional RGB data is extremely sensitive to many factors,such as lighting conditions,scale variation,data occlusion,all of which may affect the accuracy of action recognition(recognition accuracy).In addition,traditional monocular video sensors can not accurately capture human motions in 3D space.In recent years,with the release of RGB-D cameras,it has become a relatively easy task to extract human skeletons from video sequences.Compared with traditional RGB video sequences,skeleton-based action sequences are more robust to the aforementioned disadvantages,because of which,more and more researchers have decided to focus on skeleton-based action recognition approaches.This research mainly focuses on the 3D skeleton representation of human actions.Firstly,Lie group is used to describe the relative geometries and relative rotations among rigid bodies in skeleton sequences,based on which,an attempt is made to eliminate the noise skeletons in skeleton sequences.Secondly,tensor decomposition method is used to obtain the linear relationships among rigid bodies,and linear dynamical systems based on non-negative tensor are built to model the timing-sequence relationships among skeletons.Finally,the active levels of the rigid bodies are used to explore the high-dimensional features representing body motion in action sequences.These high-dimensional features are mapped to a manifold space in order to overcome the adverse effects of skeleton configurations.In general,the innovations and contributions of this study include three aspects as follows:1.Key-skeleton-patterns are mined from the Lie group representation of actions on the purpose of dealing with scale variation,temporal information,noise skeletons and so on.This task is divided into the following three steps: Firstly,in order to capture scale-invariant spatial information,six rotation matrices are used to describe the direction of a rigid body in a skeleton.The rotation matrices represent the rotations between the rigid body and the three coordinate axes.Each rotation matrix is mapped to the special Orthogonal group S O(3).Secondly,the motions of the rigid body between different skeletons are used to capture the temporal information of the skeleton.Similarly,these rigid body motions are represented as points on the special Euclidean group S E(3).Based on the above two steps,a skeletal sequence,representing a human action,can be regarded as points on the Lie group(S E(3)× · · · × S E(3),S O(3)× · · · × S O(3).Thirdly,a new pattern growth algorithm based on Prefix Span algorithm is proposed in order to mine key-skeleton-patterns from action representation based on Lie group.The searching efficiency of this algorithm is improved with the number of new patterns being reduced in each growth step.2.Linear dynamical systems(LDSs)are the effective tools in various disciplines for capturing the spatio-temporal data.Based on the non-negative tensor representation of action sequences,this study can improve the parameter estimation method of traditional linear dynamical systems and analyze the advantages of 3D skeleton sequence based on non-negative tensor.In this paper,each human action is represented as a third-order non-negative tensor time series,and then linear dynamical systems based on non-negative tensor are proposed to model human actions.Non-negative Tucker decomposition is used to estimate the parameters of the linear dynamical systems.On this basis,the action descriptor composed of these parameters is mapped to an infinite Grassmannian.Finally,sparse coding and dictionary learning on the infinite Grassmannian are used to code human actions,and SVM is utilized to perform the action classification task.3.An action recognition model based on Kendall’s pre-shape is proposed to eliminate noise joints in actions.The model uses the active joints in skeleton sequences to represent the body parts of the actual motion in the actions.Then the positions of the active joints in each skeleton are mapped to Kendall’s pre-shape space in order to obtain shape-invariant skeleton configurations.Therefore,a human action is regarded as a set of points on Kendall’s preshape space,and based on which,tensor-based linear dynamical systems(t LDSs)are used to describe the spatio-temporal feature of the action. |