Font Size: a A A

Research On Computer Vision-based Human Action Detection And Recognition

Posted on:2011-08-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:C L JiangFull Text:PDF
GTID:1118360308964138Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Human action detection and recognition is considered as an important research of human movement analysis, which is receiving more and more attention in computer vision due to its large potential applications such as smart surveillance, human-computer interaction, content-based video retrieval and image compression and to significant societal and economic value. Because of the non-rigid nature of human movements, the inter-class variations are usually large while the extra-class variations are small under different conditions. It is thus more difficult to detect and recognize human actions than other objects from videos.This thesis focuses on human action detection and recognition from videos. According to the detection and recognition process of human actions and requirements from practical application, the research content of this thesis includes: (a) human detection in indoor environments; (b) target tracking in complex scenarios; (c) target tracking under occlusion; (d) human action detection and recognition from a moving camera; (e) human action detection and recognition with a dynamic background. Motivated by these issues, the corresponding solutions are proposed in this thesis. The main content and novelties of this thesis are summarized as follows:1. Due to the self-occlusions and occlusions from other objects, different view points of observation, variations of skin color from different persons, and increasing usage of wide-angle cameras, the human detection algorithm which relys on color and shape cues may fail. Motivated by these issues and by the needs of systematic extension, a blackboard-based algorithm for human detection in indoor environments is proposed. The human detection is performed by the exclusion of other indoor nonhuman objects. The improvements of this algorithm are achieved by adding or subtracting some modules of knowledge source, which is good for the extension of human detection system. Experimental results show that the blackboard-based method for human detection is feasible.2. A target tracking algorithm based on nonparametric clustering and multi-scale images is presented and applied in complex backgrounds. Firstly, a modified non-parametric color-clustering method is employed to automatically partition the color space of a tracked object, and the Gaussian function is used to model the spatial information of each bin of the color histogram. The appearance model of target is defined. Next, the Bhattacharyya coefficient is employed to derive a function describing the similarity between the target model and the target candidate. Then, a coarse-to-fine approach of multi-scale images is employed to implement the spatial localization of the tracked object. Finally, the optimal bandwidth of kernel function is obtained by the maximization of the lower bound of a log-likelihood function and is used to estimate the scale of the tracked object. Experimental results show that the proposed algorithm outperforms the classical mean shift tracker. A multi-person tracking algorithm based on the human detection and improved mean shift tracking algorithm is presented for dealing the occlusion issues occurring in the tracking process. The key point of solving occlusion issues is to associate the reliable tracks before occlusion, with the temporary tracks after occlusion. An association likelihood based on the appearance, size and location information of the tracked object is defined in this paper. The optimal association is computed using the Hungarian algorithm. The experimental results show that the tracking algorithm is effective.3. In order to handle the cases such as a moving camera or dynamic backgrounds, an approach based on shape-motion action prototype tree is introduced for action recognition. During training, action prototypes are learned via k-means clustering. A binary prototype tree is constructed via hierarchical k-means clustering using the set of learned action prototypes. The action prototypes are stored in the leaf nodes of the binary prototype tree. During testing, humans are first detected and tracked using appearance information to get rough location of an actor. Then the joint probability optimization is performed to refine the location of the actor and identify the corresponding prototype. Then actions are recognized based on dynamic time warping. A HMM-based frame-to-prototype matching scheme is introduced and compared with the tree-based frame-to-prototype matching scheme. Experimental results demonstrate that our approach achieves recognition rates of 91.07% on the Keck gesture dataset, 100% on the Weizmann action dataset, 95.77% on the KTH action dataset and 99.23% on the Checkout Counter Dataset.4. A tree-based approach to integrate action detection, recognition and segmentation is proposed for the moving camera and dynamic background. During training, a set of action prototypes is first learned based on k-means clustering, and then a binary tree model is constructed. Each tree node stores a rejection threshold learned for fast matching during training and testing. Each leaf node also includes a list of learned parameters: the frame indices of training descriptors that best match with the leaf node and an action class distribution. During testing, an action is first localized by matching the feature descriptors extracted from sliding windows to the learned tree using a fast matching method, and then followed by global filtering to refine the location of the action. The action is recognized by maximizing the sum of the joint probabilities of the action category and action prototype over test frames. The action is segmented by using the segmentation mask computed by the stored frame indices of training data in the matched leaf nodes. Experimental results show that our approach can achieve recognition rates of 100% on the CMU action dataset and 100% on the Weizmann dataset.5. A discriminative tree-based Hough voting technique for multiclass action detection and recognition is proposed to address the moving camera and dynamic background issues. During training, a pair of localization trees using local motion and appearance features is learned, and a recognition tree using joint hog-flow features is learned via hierarchical label consistent k-means clustering algorithm. A class distribution is stored for each tree node, and each tree node of the localization tree also includes a set of offsets relative to the object center. During testing, a small number of regions most likely to contain the actor are found by local feature voting using the localization trees, and then the holistic features are extracted from these regions. Finally, the action is recognized by using the holistic feature voting using the recognition tree. Experimental results demonstrate that our approach outperforms the state of art on the Keck gesture dataset, the CMU action dataset and the KTH action dataset.
Keywords/Search Tags:Human Action Detection and Recognition, Target Tracking, Human Detection, Action Prototype Tree, Label Consistent k-means Clustering
PDF Full Text Request
Related items