Font Size: a A A

Action recognition using log-covariance matrices of silhouette and optical-flow features

Posted on:2013-04-25Degree:Ph.DType:Thesis
University:Boston UniversityCandidate:Guo, KaiFull Text:PDF
GTID:2458390008982016Subject:Engineering
Abstract/Summary:
Algorithms for recognizing human actions in a video sequence are needed in applications such as video surveillance and video search and retrieval. Developing algorithms that are not only accurate but also efficient is challenging due to the complexity of the task and the sheer size of video.;In this thesis, we develop a general framework for compactly representing, quickly comparing, and accurately recognizing actions using empirical covariance matrices of features. With each pixel we associate a feature vector which provides a localized description of the action. This generates a spatio-temporally dense collection of action feature vectors. We use the empirical covariance matrix of this feature vector collection as a low-dimensional representation of the action. We use two supervised learning methods, the nearest-neighbor classification and sparse-linear approximation classification, for action recognition using labeled training dictionaries of action co-variance matrices. Common to both methods is the novel idea that classification algorithms that have been developed for vectors can be re-purposed for covariance tensors by using a log-nonlinearity to map the convex cone of covariance matrices to the vector space of symmetric matrices.;We illustrate the approach on two types of action feature vectors. One is based on silhouette tunnels of moving objects, and the other is based on optical flow. Action feature vectors of the first type describe the shape of the silhouette tunnel. Action feature vectors of the second type describe various motion characteristics such as velocity, gradient, and divergence. We demonstrate state-of-the-art recognition performance for both types of action feature vectors on the Weizmann, KTH, YouTube and the low-resolution ICPR-2010 challenge data sets under modest CPU requirements.;We also demonstrate how our approach can be used for sequentially detecting changes in actions in an adaptive, unsupervised manner so as to parse a long video into sub-videos, each containing only a single action class. We use a non-parametric statistical framework to learn the distribution of the nearest-neighbor Riemannian distances between feature covariance matrices of video segments. Then, we use binary hypothesis testing to determine if new video segments include action changes. Our algorithm can detect 98.36% of action boundaries with 0.19% false alarm rate.;We conclude by discussing how our framework can be adapted to recognize human interactions, which is usually a more challenging problem due to occlusion between moving individuals. We develop an approach based on dividing human interactions into separate sequences, each containing a single individual, and then combining the estimated action likelihoods for each individual sequence.;The excellent performance of log-covariance-matrix representation combined with sparse-linear approximation classification demonstrated here for action recognition should encourage the use of this framework for other recognition problems.
Keywords/Search Tags:Action, Feature, Covariance matrices, Video, Using, Silhouette, Framework, Classification
Related items