Font Size: a A A

Simple and Complex Human Action Recognition in Constrained and Unconstrained Video

Posted on:2019-04-15Degree:Ph.DType:Dissertation
University:University of Windsor (Canada)Candidate:Mohammadi Nejad, EmanFull Text:PDF
GTID:1478390017487504Subject:Artificial Intelligence
Abstract/Summary:
Human action recognition plays a crucial role in visual learning applications such as video understanding and surveillance, video retrieval, human-computer interactions, and autonomous driving systems. A variety of methodologies have been proposed for human action recognition via developing of low-level features along with the bag-of-visual-word models. However, much less research has been performed on the compound of pre-processing, encoding and classification stages. This dissertation focuses on enhancing the action recognition performances via ensemble learning, hybrid classifier, hierarchical feature representation, and key action perception methodologies.;Action variation is one of the crucial challenges in video analysis and action recognition. We address this problem by proposing the hybrid classifier (HC) to discriminate actions which contain similar forms of motion features such as walking, running, and jogging. Aside from that, we show and proof that the fusion of various appearance-based and motion features can boost the simple and complex action recognition performance.;The next part of the dissertation introduces pooled-feature representation (PFR) which is derived from a double phase encoding framework (DPE). Considering that a given unconstrained video is composed of a sequence of simple frames, the first phase of DPE generates temporal sub-volumes from the video and represents them individually by employing the proposed improved rank pooling (IRP) method. The second phase constructs the pool of features by fusing the represented vectors from the first phase. The pool is compressed and then encoded to provide video-parts vector (VPV). The DPE framework allows distilling the video representation and hierarchically extracting new information. Compared with recent video encoding approaches, VPV can preserve the higher-level information through standard encoding of low-level features in two phases. Furthermore, the encoded vectors from both phases of DPE are fused along with a compression stage to develop PFR.;The real-world long-shot video streams contain complicated contents and editing artifacts. However, the conventional action recognition frameworks are only capable of analyzing the pre-segmented short-shot videos. The last chapter of this dissertation focuses on key action perception (KAP) along with a robust video action clustering for unconstrained and constrained video analysis. The KAP includes two classifiers: the former detects the key action among multiple temporal clusters, and the latter recognizes the key action which is obtained by the former classifier. The video action clustering is the essential pre-processing step for KAP implementation. The sequential relationship of the video frames and complexity of motion representations provide challenges in video action clustering. We propose two novel multi-layer subspace video action clustering (ML-VAC) techniques to encode the sequential relationships of constrained and unconstrained video frames without having any prior knowledge about the number of temporal clusters in a given video.;We evaluate the proposed techniques on simple and complex datasets, such as UCF50, HMDB51, Hollywood2, KTH, Weizmann, URADL, UCF101, Olympic Sports, and Keck Gestures. The employed datasets contain constrained and unconstrained video samples to test the proposed strategies in different conditions.
Keywords/Search Tags:Video, Action, Constrained and unconstrained, Simple and complex, DPE, Proposed
Related items