Font Size: a A A

Model based view-invariant human action recognition and segmentation

Posted on:2008-06-13Degree:Ph.DType:Dissertation
University:University of Southern CaliforniaCandidate:Lv, FengjunFull Text:PDF
GTID:1448390005450974Subject:Computer Science
Abstract/Summary:
Recognizing basic human actions such as walking, sitting down and waving hands from a single video is an important task for many applications in video surveillance, human computer interaction and video content retrieval. This is a difficult problem and the difficulties are two-fold: On the one hand, a truly view-invariant approach needs the knowledge of 3D human poses. However, inferring 3D human poses from a monocular view is a difficult problem in itself because of the large number of parameters that need to be estimated and the ambiguity caused by perspective projection. On the other hand, even if such poses are given or can be perfectly recovered, action recognition is still a challenging problem because of the high dimensionality of the pose data and large spatial and temporal variations in the same action class.; We present two approaches to address these two difficulties respectively. The first approach handles relatively ideal cases, in which 3D human poses are given (from motion capture or pose tracking system) as input and the approach focuses on recognizing the dynamics of each action class and segmenting long continuous sequences into short action segments. The second approach handles more realistic cases, in which image features (human silhouettes) are directly used. We use example based method. Instead of explicitly inferring 3D human poses, from existing action models, we search for a series of actions that best match the input sequence.; Both approaches are model-based. The first approach uses Hidden Markov Models (HMMs) to learn the dynamics of each action class and compute the probability of occurrence of the observation sequence given model parameters. In the second approach, each action is modeled as a series of synthetic 2D human poses rendered from a wide range of viewpoints. The constraints on transition of the synthetic poses is represented by a graph model called Action Net. By taking advantage of such constraints, we eliminate many short-term errors caused by image noises and perspective ambiguity.
Keywords/Search Tags:Action, Human, Model
Related items