Font Size: a A A

Human Action Recognition Via Fusing Multi-model Features From RGB-D Videos

Posted on:2016-08-21Degree:MasterType:Thesis
Country:ChinaCandidate:K H ChenFull Text:PDF
GTID:2308330473960838Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Human action recognition has remained a challenging problem in the fields of computer vision and pattern recognition. However, the RGB-D sensors with depth perception like Microsoft Kinect, can obtain rich and multi-modal visual data such as RGB texture, depth related point clouds and skeletons etc., for action recognition with diverse challenges(e.g. occlusion and variation between the behaviors). This thesis proposes a human action recognition approach via fusing multi-model visual features from RGB-D videos.To resolve the limitations of single modal features for effective representation, DenseMP feature being robust to human motion and SHOPC feature perceiving geometry information are proposed to fuse multi-modal visual features of human action. The proposed DenseMP descriptor based on traditional MovingPose feature and dense trajectories, can solve effectively the problem of less coverage of motion region and being sensitive to disturbance, and also avoid the phenomenon of the unstable motion trajectory in the traditional dense trajectory. The presented SHOPC feature can be obtained by considering the low level point cloud information description stragety from the traditonal HOPC feature and combining with adaptive space-time pyramid scheme. And, the SHPOC descriptors can describe the geometric appearance and compensate for motion feature to cover the deficiency of the DenseMP feature for the classification of the human action categories with similar motions, while it can preserve the spatial-time distribution relationship of point cloud geometrical information with the viewpoint invariant property. Experiment results show that, comparing with some traditional methods, the presented DenseMP and SHOPC features can extract effectively motion cue and geometry information respectively. And, the multiple kernel learning technique to fuse these two given features with HOG3 D feature from RGB texture will achieve better performance of the action classification.To address the issue that there exist several visual categories in the same semantic class in human actions, we propose a classification discriminative model with Exemplars-MKL-ELM. Compared with the traditional K-means algorithm, contrast data mining technique can obtain the set of respresentive exemplars which are more compact in the intra-classes and have more discriminative ability for the inter-classes in the human action categories.. A weighted MKL-ELM classifier with single exemplar is proposed to resolve the problem of unbalanced distributions of each class in the training samples, while the multiple kernel learning stragety can fuse effectively multi modal visual feature of each sample to classification.The hidden-layer parameters of ELM classifier with single layer network, need not be tuned and can be fixed once randomly generated, while the output layer parameters can be obtained by using the linear inversion method. Thus, compared to the similar Exemplars-SVM model, the proposed Exemplars-ELM model can ensure high classification accuracy and the higher efficiency. The experiments show that, compared with similar Exemplars-SVM model, the proposed Exemplars-MKL-ELM based classification decision model has more significant advantages in computation efficiency and classification accuracy.In the Exemplars-MKL-ELM based decision model, there exists certain gap between the classification efficiency and real-time requirements during testing phase. This thesis adopts a greedy hierarchical prediction strategy with the obtained respresentive exemplars by contrast data mining method to solve the online human action recognition problem. First, the multi-kernel KNN neighbor method is applied to classify testing samples in a coarse grain level. Second, the Exemplars-MKL-ELM based model is exploited to re-classify the testing samples with low decision confidence after the MK-KNN classification in a fine grain level. Experimental results indicate that, compared with only Exemplars-MKL-ELM based prediction method, the presented greedy hierarchical prediction strategy can balance effectively the computational efficiency and classification accuracy, to satisfy the requirements of real-time processing.
Keywords/Search Tags:Multi-Modal Features, KNN, ELM, MKL, RGB-D
PDF Full Text Request
Related items