Font Size: a A A

3D Skeleton-based Spatio-temporal Representation And Human Action Recognition

Posted on:2018-04-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:W W DingFull Text:PDF
GTID:1368330542493476Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Human behavior recognition is a very active research topic in computer vision and multimedia analysis field,which relates to several subjects of image processing,pattern recognition and artificial intelligence,such as interactive entertainment and games,video surveillance,video retrieval system,life care and abnormal behavior detection.The main challenge lies in data acquisition accuracy and the dynamic behavior identification modeling the sequence of actions.The main factors which can influence the behavior recognition rate are divided into four categories: 1)occlusion,shadow and light conditions;2)view changes;3)scale changes;4)intra-class variability and inter-class similarity of actions.In recent years,with the release of 3D depth camera,such as the Microsoft Kinect,3D depth data can provide pictures of the scene changes,which significantly improve the recognition rate of the first three challenges of human recognition.In addition,3D depth camera also provides a very powerful human motion capture technology,which can output the human skeleton of the three-dimensional joint point position.Starting from the 3D skeleton sequence of human body,the spatio-temporal feature chain and the profile hidden Markov model based on double chains are firstly proposed to solve the time dynamic problem of feature sequence matching.Then puts forward the Hierarchical Self-Organizing Mapping to predict action through the importance of actionlet in each action acquired by Hebbin learning;Finally,in order to efficiently and accurately obtain the characteristics of human behavior in the low dimensional manifold,we represent3 D human skeleton sequence as a tensor and the Linear Dynamical Systems(LDS)was extended to discover the intrinsic structure in the tensor.Representative behavior databases are to verify the effectiveness of the proposed methods.Specifically,the main contributions and innovations of this thesis include the following four points:1.This thesis proposes a Spatio-Temporal Feature Chain(STFC)to deal with the inconsistency of 3D skeleton sequences and the existence of repetitive actionlets.The STFC is obtained mainly through three steps: In the first level,a trajectory of action,also referred to as a discrete curve,can be drawn by a 3D joint point,which is able to capture the segmentation points of each actionlet using the direction of motion and curvature of the trajectory.These newly obtained segmentation points are also able to determine the start-frame and end-frame of the action,and eliminate noise to a certain degree.In the intermediate level,a graph,called the Actionlets Graph,is built to represent the position and motion relationship between the actionlets of an action in order to erase its periodic sequences.The aperiodic sequence of an action should be mined from this graph.In the last and most important level,a new model,called STFC,is proposed.STFC includes several aperiodic sequences whose nodes contain several viewpoint-invariants.2.This thesis uses profile Hidden Markov Model(Profile HMM)for the problem of Space-time alignment.In this section,we obtain meaningful action-units through take advantage of segmentation points.With labeling these action-units,an action can be represented by discrete symbol sequences.To overcome an abrupt change or an abnormal in its gesticulation between different performances of the same action,profile HMMs are applied with these symbol sequences using Viterbi and Baum-Welch algorithms for human activity recognition.3.This thesis presents a novel approach to learning a hierarchical spatio-temporal pattern of human activities to predict ongoing activities from videos that contain only the onsets of the activities.Spatio-temporal pattern can be learned by a Hierarchical Self-Organizing Map(HSOM),which consists of two self-organizing maps connected via associative links trained by Hebbian learning.Ongoing activities can be predicted by Variable order Markov Model,which provides the means for capturing both large and small order Markov dependencies based on the training actionlet sequences.4.This thesis extends the traditional methods for estimating parameters of linear dynamic system under the representation of tensor action sequences and analyzes the advantage of higher order tensor representation of 3D skeleton sequences.Linear Dynamical System(LDS)which is the most common for encoding spatio-temporal time-series data in various disciplines due to its relative simplicity and efficiency.However,the traditional LDS treats the latent and observation state at each frame of video as a column vector.Such a vector representation fails to take into account the curse of dimensionality as well as valuable structural information with human action.Considering this fact,we propose generalized Linear Dynamical System(gLDS)for modeling tensor observation in the time series and employ Tucker decomposition to estimate the LDS parameters as action descriptors.Therefore,an action can be represented as a subspace corresponding to a point on a Grassmann manifold.Finally,we perform classification using dictionary learning and sparse coding over Grassmann manifold and achieve obvious performance improvement.
Keywords/Search Tags:3D Skeleton, Depth Image, Human Action Recognition, Action Segmentation, Hidden Markov Model, Self-Organizing Mapping, Tensor, Linear Dynamical System, Grassmann Manifold
PDF Full Text Request
Related items