Font Size: a A A

Novel Algorithms for Human Action Recognition in Video

Posted on:2018-09-20Degree:Ph.DType:Thesis
University:New York University Tandon School of EngineeringCandidate:Xu, TiantianFull Text:PDF
GTID:2448390002999560Subject:Computer Science
Abstract/Summary:
Human action recognition from videos plays a crucial role in applications such as video annotation and retrieval, intelligent surveillance, sports video analysis, human-computer interactions, etc. It is a challenging problem in computer vision due to the highly-variable nature of human actions. In addition, variations in scale, illumination, viewpoint and background in the video make the problem even more challenging. In this thesis, we propose new algorithms for human action recognition based on developing novel video features and descriptors that are effective in discriminating complex actions. We also propose a new learning method for cross-dataset human action recognition.;In Chapter 2, we propose a new motion feature called difference HOG (dHOG) and, based on it, we develop a new video descriptor by encoding the pairwise spatial co-occurrences of motion cells and their spatial displacements within individual frames by using a dictionary. Temporal co-occurrence matrices are then used to capture the temporal co-occurrences of code words and the final descriptor is built by concatenating the Bag-of-Words (BoW) representation of the code words and the PCA-reduced temporal co-occurrence matrices.;In Chapter 3, we build video descriptors by dividing an action video into temporal segments and then extract low-level features (HOG, HOF and MBH) from individual segments. An ensemble of many-to-one encoders is then used to learn generalized high-level features from individual segments. We introduce two new algorithms to perform unsupervised segmentation of a video into temporal segments that correspond to sub-actions in an action video. The first is by performing K-means clustering of the low-level features, then followed by iterative adjustments of segment boundaries. The second algorithm uses Adaptive Affinity Propagation to perform clustering of the low-level features. Dynamic Time Warping is then used to iteratively merge segments to produce a hierarchical tree representation for the action video.;In Chapter 4, we tackle the problem of cross-dataset action recognition by making use of the knowledge of a known dataset to aid in the training and classification of a new dataset that is not fully annotated. The main challenge in cross-dataset action recognition is the huge intra-class variance introduced by different video sources. We propose a transfer-learning method based on a dual many-to-one encoder framework that trains one encoder on the source dataset and the second on the target dataset in parallel. The trained encoders map features from the two datasets to a generalized feature space, thus enabling the transfer of knowledge between the two datasets. During training, the generalized features extracted from the source dataset augments the training set of the insufficiently annotated target dataset.;We applied our algorithms to several challenging benchmark datasets to demonstrate their effectiveness. Our proposed algorithms outperformed many state-of-the-art methods in terms of recognition accuracy, most notably beating the state-of-the art result on the challenging HMDB51 dataset by over 20% when the second segmentation- based method in Chapter 3 is used.
Keywords/Search Tags:Action recognition, Video, Algorithms, Dataset, Challenging, Used, Chapter
Related items