Font Size: a A A

Multi-modal Human Action Recognition

Posted on:2016-01-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y F FengFull Text:PDF
GTID:1108330470967840Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the emergence of various novel sensors like Microsoft Kinect in recent years, multi-modal human action recognition becomes a new hotspot of computer vision research. The research results can be applied to a wide range of real-world applications such as in-telligent video surveillance, interactive entertainment, video content analysis and retrieval and so on. In this thesis, we focus on the research of multi-modal human action recogni-tion according to the three major steps, which are multi-modal data preprocessing, feature extraction and selection, and human action recognition, one by one, to study some related techniques as below.Firstly, in order to deal with 3D human motion data completion and denoising prob-lems, three novel algorithms have been proposed in multi-modal data preprocessing step, which are described as follows.We propose a novel method called (?)1-sparse representation of missing markers predic-tion (Ll-SRMMP) to deal with the missing markers problem in human motion capture. L1-SRMMP converts the conventional missing markers problem into an optimization problem, which tries to find a sparse representation of the observable data of the incomplete pose. To mitigate the limited capacity problem of the training dataset, we also propose a repre-sentation coefficient weighted updating (RCWU) algorithm to update the training dataset. It can effectively improve the stability of our prediction algorithm.We propose a data-driven based robust human motion denoising approach via min-ing the spatial-temporal patterns and the structural sparsity embedded in human motion data. Compared with the other data-driven based methods, our approach does not need to specifically choose the training dataset, which means that it is much more easier to use our approach in real-world applications than the others. In a large number of simulated and real noisy data evaluation experiments, our approach consistently yields better performance than its counterparts and its outputs are much more stable than those of the others.We propose a non-data-driven based human motion data refinement algorithm to si-multaneously solve two sub-problems (i.e., missing data prediction and motion denoising) involved in human motion data refinement in a joint framework. Both the low-rank struc-ture and temporal stability properties of human motion data as well as the noise effect are taken into account in designing our objective function. An efficient optimization method derived from the augmented Lagrange multiplier algorithm is presented to solve the pro-posed model. Besides, a trust data detection method is also introduced to improve the degree of automation for processing the entire set of the data and boost the performance. Exten-sive experiments and comparisons with other methods demonstrate the effectiveness of our approaches on both predicting missing data and denoising.Secondly, in the step of feature extraction and selection, we propose an unsupervised multi-view feature selection (AUMFS) to select a compact yet discriminative feature sub-set as feature representation from the original high-dimensional heterogeneous multi-view features. This method successfully solves the problem that most existing feature selection algorithms are designed for the single view feature so that they can not deal with the multi-view features directly, and they will fail to exploit the correlation between different views. Meanwhile, we propose a new active learning algorithm called locally regressive optimal design (LROD) to actively select the unlabeled multi-modal data for improving the perfor-mance of algorithm. We applied LROD to relevance feedback-based social image retrieval. Experimental results show that LROD outperforms its counterparts in the special applica-tion.Finally, for human action recognition, a semantic constrained multi-modal feature fu-sion and action recognition algorithm and a multiple skeletal features fusion and selection algorithm have been proposed. The former utilizes the property that different modals of RGB-D data are of the same semantic meaning, and exploits the strong correlation and complementary information between different modals. And, it learns high-level semantic feature based on the multi-modal middle-and low-level features for multi-modal human ac-tion recognition. The latter first extracts multiple kinds of visual features from 3D human skeleton sequence data, and then uses feature fusion and selection algorithms to construct more compact and discriminative feature representation. Extensive experiments on the pub-lic datasets shown that our approach requires little computational cost and storage space, and more importantly, its recognition accuracy is close to or higher than the other existing work.
Keywords/Search Tags:Human Action Recognition, Multi-modal Data, Sparse Representation, Sparse Coding, Dictionary Learning, Data-driven, Motion Capture, Human Motion Data, Fea- ture Selection, Active Learning, Low-rank Matrix Completion
PDF Full Text Request
Related items