Font Size: a A A

Scalable action recognition in continuous video streams

Posted on:2013-07-29Degree:Ph.DType:Dissertation
University:University of California, IrvineCandidate:Pirsiavash, HamedFull Text:PDF
GTID:1458390008485413Subject:Computer Science
Abstract/Summary:
Activity recognition in video has a variety of applications, including rehabilitation, surveillance, and video retrieval. It is relatively easy for a human to recognize actions in a video once he/she watches it. However, in many applications the videos are very long, eg. in life-logging, and/or we need the real-time detection, eg. in human computer interaction. This motivates us to build computer vision and artificial intelligence algorithms to recognize activities in video sequences automatically.;We are addressing several challenges in activity recognition, including (1) computational scalability, (2) spatio-temporal feature extraction, (3) spatio-temporal models, and finally, (4) dataset development. (1) Computational Scalability: We develop "steerable'' models that parsimoniously represent a large collection of templates with a small number of parameters. This results in local detectors scalable enough for a large number of frames and object/action categories. (2) Spatio-temporal feature extraction: Spatio-temporal feature extraction is difficult for scenes with many moving objects that interact and occlude each other. We tackle this problem using the framework of multi-object tracking and developing linear-time, scalable graph-theoretic algorithms for inference. (3) Spatio-temporal models: Actions exhibit complex temporal structure, such as sub-actions of variable durations and compositional orderings. Much research on action recognition ignores such structure and instead focuses on K-way classification of temporally pre-segmented video clips cite{poppe2010survey,DBLP:journals/csur/AggarwalR11}. We describe lightweight and efficient grammars that segment a continuous video stream into a hierarchical parse of multiple actions and sub-actions. (4) Dataset development: Finally, in terms of evaluation, video benchmarks are relatively scarce compared to the abundance of image benchmarks. It appears difficult to collect (and annotate) large-scale, unscripted footage of people doing interesting things. We discuss one solution, introducing a new, large-scale benchmark for the problem of detecting activities of daily living (ADL) in first-person camera views.
Keywords/Search Tags:Video, Recognition, Spatio-temporal feature extraction, Scalable
Related items