Scalable action recognition in continuous video streams

Posted on:2013-07-29

Degree:Ph.D

Type:Dissertation

University:University of California, Irvine

Candidate:Pirsiavash, Hamed

Full Text:PDF

GTID:1458390008485413

Subject:Computer Science

Abstract/Summary:

Activity recognition in video has a variety of applications, including rehabilitation, surveillance, and video retrieval. It is relatively easy for a human to recognize actions in a video once he/she watches it. However, in many applications the videos are very long, eg. in life-logging, and/or we need the real-time detection, eg. in human computer interaction. This motivates us to build computer vision and artificial intelligence algorithms to recognize activities in video sequences automatically.;We are addressing several challenges in activity recognition, including (1) computational scalability, (2) spatio-temporal feature extraction, (3) spatio-temporal models, and finally, (4) dataset development. (1) Computational Scalability: We develop "steerable'' models that parsimoniously represent a large collection of templates with a small number of parameters. This results in local detectors scalable enough for a large number of frames and object/action categories. (2) Spatio-temporal feature extraction: Spatio-temporal feature extraction is difficult for scenes with many moving objects that interact and occlude each other. We tackle this problem using the framework of multi-object tracking and developing linear-time, scalable graph-theoretic algorithms for inference. (3) Spatio-temporal models: Actions exhibit complex temporal structure, such as sub-actions of variable durations and compositional orderings. Much research on action recognition ignores such structure and instead focuses on K-way classification of temporally pre-segmented video clips cite{poppe2010survey,DBLP:journals/csur/AggarwalR11}. We describe lightweight and efficient grammars that segment a continuous video stream into a hierarchical parse of multiple actions and sub-actions. (4) Dataset development: Finally, in terms of evaluation, video benchmarks are relatively scarce compared to the abundance of image benchmarks. It appears difficult to collect (and annotate) large-scale, unscripted footage of people doing interesting things. We discuss one solution, introducing a new, large-scale benchmark for the problem of detecting activities of daily living (ADL) in first-person camera views.

Keywords/Search Tags:

Video, Recognition, Spatio-temporal feature extraction, Scalable

Related items

1	Video Action Recognition Based On 2D Convolution Network Under Spatio-Temporal Feature Enhancement Mechanism
2	Research On Surveillance Video Synopsis Based On Spatio-Temporal Slice
3	Research On Video Action Recognition Technology Based On Spatiotemporal Feature Extraction
4	Research Of Video Spatio-temporal Feature Extraction And Retrieval Algorithm Based On Deep Learning
5	Dynamic Gesture Recognition Based On Spatio-temporal Feature Representation And Dictionary Optimization
6	Research On Double Modal Video Content Recognition Based On Spatio-temporal Features And Bag Of Words Model
7	Research On Spatio-Temporal Indexing Mechanism And Querying Strategy
8	Action Recognition Method Based On Multi-frequency Spatio-temporal Feature Learning
9	Research On The Local Spatio-Temporal Relationships Based Feature Model For Action Recognition
10	Research On Feature Extraction And Recognition Of Human Actions In Video Sequences