Font Size: a A A

Human Action Recognition In Video Sequence

Posted on:2016-03-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y B ChenFull Text:PDF
GTID:1108330482450145Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
In recent years, automatic human action recognition has drawn much attention in the field of video analysis technology due to the growing demands from many applications, such as video surveillance, entertainment environments and healthcare systems. In this dissertation, we focus on several drawbacks of the existing core technologies of human action recognition system and propose methods of solving the problems. Our main work includes:1) We propose a novel spatio-temporal interesting point detector. Recently, sparse interesting point detector and dense sampling methods are two kinds of mostly used point detectors. However, the former can only extract too few points to contain enough information for describing human actions in realistic scenes with camera motion and cluttered background; while the latter samples points at regular positions in multiple scales enhancing much computation complex, and it uniformly samples pixels without distinguishing foreground from the background which would induce excessive noise with negative impact for recognition accuracy. To solve these problems, we propose a novel spatio-temporal interesting points detectors based on vorticity. Points are extracted mainly lie on the surrounding areas of joints of the moving subjects. And our detector can suppresses most of the camera movements since it is derived from the differentials of flow field. Besides, our point detector only consists subtractions of the flow values which is very efficient. Furthermore, since the detected points are local dense, we can adopt windows with randomly selected size for feature extraction to achieve scale invariance without invoking multiple spatio-temporal-scales, which further the computational complexity of our method. Experiments on several common datasets demonstrate that our proposed detector achieve comparable performance with less than half the computation time compared with the state-of-the-art dense sampling, which manifests a good tradeoff between recognition accuracy and computational complexity. This enhance the ability for real application of action recognition.2) We propose a novel sparse coding model to provide more discriminative feature representations. Traditional sparse representation requires to solve L1-norm optimization problems among whole base atoms making it always with high computational complexity. Besides, due to simply pursuit sparsity, a data sample may be represented by dictionary atoms of very different subsets, the resulting sparse codes can have significant variations in the activation set, which have harmful impact for recognition tasks. Based on this, we propose a sparse model with non-negative and locality constraints (SCNL). The non-negative constraint ensures that every data sample is in the convex hull of its neighbors. The locality constraint makes a data sample only represented by its related neighbor atoms. The sparsity constraint confines the dictionary atoms involved in the sample representation as fewer as possible. The proposed SCNL model can better capture the global subspace structures of data than classical sparse coding, and more robust to noise compared to locality-constrained linear coding. Extensive experiments testify the significant advantages of the proposed SCNL model through evaluations on three remarkable human actions datasets.3) We propose a novel, robust and discriminative graph model to enhance the performance of action recognition by exploiting the large amount of unlabeled data in practical applications. Most of the classification methods are under strong supervised. As to gain good performance, a large amount of labeled samples are needed in train process for gain good model parameters. However, data annotations is a very heavy and boring work, especially for annotating videos. In a graphical model, both the labeled and unlabeled data can be utilized to reveal local and global structures of a data set, has received considerable attentions in computer vision. As the vast variability in action appearance caused by many realistic factors, how to construct a discriminative graph is the key step in a graph-based method for recognizing human actions. There are several problems for the traditional graph constructions methods, includes KNN and ε-ball methods, and the recently popular L1-graph. Firstly, these methods are all rely on Euclidean distances, which may not be a good metric for action comparisons when the underlying manifold structure of actions is curved; secondly, the existing graph is constructed using single modality or cascaded modalities which did not exploited the specific information contained in different modalities. Based on this, we propose a novel discriminative multi-modality non-negative sparse(DMNS) graph for action recognition. In the model, features are first projected into the Mahalanobis space by learned transformations where the inter-class data are pushed far away from each other and inter-class are pulled together, enhancing the discriminative power of our model. Furthermore, we combine multiple modalities with joint sparsity constraint in our graph model, which not only can exploit specific information contained in different modalities but also can prevent the influence of noise and outliers enhancing the robustness of the graph. Extensive experiments over two benchmark datasets demonstrate the advantages of the proposed DMNS-graph method over the state-of-the-art methods.
Keywords/Search Tags:Human Action Recognition, Spatio-temporal Interesting Point, Sparse Coding, Graph Model, Multi-modality
PDF Full Text Request
Related items