Font Size: a A A

Monocular human pose tracking and action recognition in dynamic environments

Posted on:2012-12-01Degree:Ph.DType:Dissertation
University:University of Southern CaliforniaCandidate:Singh, Vivek KumarFull Text:PDF
GTID:1458390011952368Subject:Computer Science
Abstract/Summary:
The objective of this work is to develop an efficient method to find human in videos captured from a single camera, and recognize the action being performed. Automatic detection of humans in a scene and understanding the ongoing activities has been extensively studied, as solution to this problem finds applications in diverse areas such as surveillance, video summarization, content mining and human computer interaction, among others.;Though significant advances have made towards finding human in specific poses such as upright pose in cluttered scenes, the problem of finding a human in an arbitrary pose in an unknown environment is still a challenge. We address the problem of estimating human pose using a part based approach, that first finds body part candidates using part detectors and then enforce kinematic constraints using a tree-structured graphical model. For inference, we present a collaborative branch and bound algorithm that uses branch and bound method to search for each part and use kinematics from neighboring parts to guide the branching behavior and compute bounds on the best part estimate. We use multiple, heterogeneous part detectors with varying accuracy and computation requirements, ordered in a hierarchy, to achieve more accurate and efficient pose estimation.;While the above approach deals well with pose articulations, it still fails to find human in poses with heavy self occlusion such as crouch, as it does not model inter part occlusion. Thus, recognizing actions from inferred poses would be unreliable. In order to deal with this issue, we propose a joint tracking and recognition approach which tracks the actor pose by sampling from 3D action models and localizing each pose sample; this also allows view-invariant action recognition. We model an action as a sequence of transformations between keyposes. These action models can be obtained by annotating only a few keyposes in 2D; this avoids large training data and MoCAP. For efficiently localizing a sampled pose, we generate a Pose-Specific Part Model (PSPM) which captures appropriate kinematic and occlusion constraints in a tree-structure. In addition, our approach also does not require pose silhouettes and thus also works well in presence of background motion. We show improvements to previous results on two publicly available datasets as well as on a novel, augmented dataset with dynamic backgrounds.;Since the poses are sampled from action models, the above activity driven approach works well if the actor only performs actions for which models are available, and does not generalize well to unseen poses and actions. We address this by proposing an activity assisted tracking framework that combines the activity driven tracking with the bottom up pose estimation by using pose samples obtained using part models, in addition to those sampled from action models. We demonstrate the effectiveness of our approach on long video sequences with hand gestures.
Keywords/Search Tags:Action, Human, Pose, Approach, Tracking, Recognition, Part
Related items