Font Size: a A A

Analyzing Human Activities in Videos using Component Based Models

Posted on:2014-04-02Degree:Ph.DType:Thesis
University:University of Southern CaliforniaCandidate:Khan, Furqan MuhammadFull Text:PDF
GTID:2458390005492736Subject:Computer Science
Abstract/Summary:
With cameras getting smaller, better and cheaper, the amount of videos produced these days has increased exponentially. Although not comprehensive by any means, the fact that about 35 hours of video is uploaded to YouTube every minute is indicative of the amount of data that is being generated. This is in addition to the videos recorded for surveillance by grocery stores and by security agencies at airports, train stations and streets. Whereas analysis of the video data is the core reason for surveillance data collection, services such as YouTube can also use video analysis to improve search and indexing tasks. However, due to extremely large amount of data generation, human video analysis is not feasible; therefore, development of methods which can automatically perform the intelligent task of visual understanding, specifically human activity recognition, has seen a lot of interest in past couple of decades. Such capability is also desired to improve human computer interaction. However, the associated problem of activity description, i.e., information about actor, location and object of information, has not got much attention despite its importance for surveillance and indexing tasks. In this thesis, I propose methods for automated action analysis, i.e., recognition and description of human activities in videos.;The task of activity recognition is seemingly easily performed by humans but it is very difficult for machines. The key challenge lies in modeling of human actions and representation of transformation of visual data with time. This thesis contributes to provide possible solutions related to development of action models to facilitate action description which are general enough to capture large variations in an action class while allowing for robust discrimination of different action classes and corresponding inference mechanisms. I model actions as a composition of several primitive events and use graphical models to evaluate consistency of action models with video input. In the first part of the thesis, I use low-level features to capture the transformation of spatiotemporal data during the primitive event. In the second part, to facilitate description of activities, such as identification of actor and object of interaction, I decompose actions using high-level constructs, actors and objects. Primitive components represent properties of actors and their relationships with objects of interaction. In the end, I represent actions as transformation of actor's limbs (human pose) over time and decompose actions using key poses. I infer human pose, object of interaction and the action for each actor jointly using a dynamic Bayesian Network.;This thesis furthers research on relatively ignored but more comprehensive problem of action analysis, i.e., action recognition with the associated problem of description. To support the thesis, I evaluated the presented algorithms on publicly available datasets. The performance metrics highlight effectiveness of my algorithms on datasets which offer large variations in execution, viewpoint, actors, illuminations, etc..
Keywords/Search Tags:Video, Human, Using, Data, Action, Models, Activities
Related items