Font Size: a A A

Extracting Moving People and Categorizing their Activities in Video

Posted on:2012-08-31Degree:Ph.DType:Dissertation
University:Princeton UniversityCandidate:Niebles Duque, Juan CarlosFull Text:PDF
GTID:1458390008499502Subject:Computer Science
Abstract/Summary:
The ability to automatically detect and track human movements, recognize actions and activities, understand behavior and predict goals and intentions has captured the attention of many computer vision scientists. One of the main motivations is the great potential impact that this technology can make on many applications such as video search and indexing, smart surveillance systems, medical research, video game interfaces, automatic sport commentary, human-robot interaction, among others.;In this work, we focus on two important questions: given a video sequence, where are the moving humans in the sequence? what actions or activities are they performing?;We first discuss the problem of extracting human motion volumes from video sequences. We present a fully automatic framework to detect and extract arbitrary human motion volumes from challenging real-world videos. We have explored a purely top-down methodology that estimates body configurations at every frame to achieve the extraction. We also present a much more efficient approach that carefully combines bottom-up and top-down cues, which enables fast extraction in near real time.;We are not only interesting in finding where the humans are in a given sequence, but also in understanding what they are doing. We present statistical models for the task of simple human action recognition based in spatial and spatio-temporal local features. First, we show that by adapting latent topic models we can achieve competitive simple action categorization performance in an unsupervised setting. We also present a hierarchical model for simple actions that can be characterized as a constellation-of-bags-of-features. This model leverages the spatial structure of the human body to improve action recognition.;While these models are successful at the task of simple action recognition, their performance suffers when the actions of interest are more complex. We propose a discriminative model for complex action recognition capable of leveraging the temporal structure and composition of simpler motions into complex actions. We show that the contextual information provided by the temporal structure in our model greatly improves the complex action classification accuracy over state-of-the art models for simple action recognition.
Keywords/Search Tags:Action, Activities, Video, Human, Model, Complex
Related items