Font Size: a A A

Research On Human Action Recognition Based On Sparse Spatio-temporal Features

Posted on:2016-07-30Degree:MasterType:Thesis
Country:ChinaCandidate:L WangFull Text:PDF
GTID:2308330464953279Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Human action recognition is a hot spot in computer vision, and also an essential way to realize the Artificial Intelligence. It has a wide application prospect in many fields, such as motion analysis, intelligent video surveillance, human-computer interaction, virtual reality, medical care, intelligent security, and so on.In order to automatically obtain action related information from the massive video data, we focus on multi-scale representation of inputs, the improvement of space-time deep belief network and different pooling strategies based on a new method called deep learning in the field of machine learning. Then we use our improved deep neural network to learn action features for human action recognition. The main research works are as follows:1) The inputs of the existing deep learning methods are limited to a single scale, but the objects in real world are composed of diverse scales. Considering the information interaction between different scales, we adopt the spatio-temporal Gabor filter to construct scale space, and then select three different scales for the inputs of different channels in the ST-DBN model to jointly learn multi-scale features. The results on KTH action dataset show that multi-scale features can achieve better accuracy as compared to single scale features.2) The ST-DBN model learns space information prior to the time information which is not so suitable for action recognition. Based on the priori knowledge that the temporal information is superior to the spatio information in a lot of motion analysis problems, we improve the traditional ST-DBN by firstly learning temporal information. The experimental results show that the action recognition of TS-DBN model is better than the ST-DBN model in both single-scale inputs and multi-scale inputs.3) In order to reduce the risk of over-fitting and increase scale invariance of action features, we introduce a more principled pooling strategy for CRBM, called sparse pyramid pooling. Inspired by the idea of spatial pyramid, the sparse pyramid pooling uses different subsampling ratios for the pooling layer which expands the outputs of the pooling layer for multi-level pyramid outputs. In addition, we adjust the pooling parameters to make the pooling region overlapping, which improves the performance of the pooling strategy to a certain extend. Besides, in order to reduce the dimension of multi-level pyramid outputs, we exploit sparse coding method to aggregate the outputs in different levels of pyramid. Experimental results show that the sparse pyramid pooling outperforms the probabilistic max-pooling. And the pyramid network can achieve comparable accuracy with the deeper network.4) Based on above research, we apply our improved deep neural network to learning sparse spatio-temporal features for action recognition on KTH action dataset and UCF sports dataset. The experimental results show that action feature representation based on deep learning methods can achieve comparable accuracy with hand-crafted descriptors. Simultaneously, our model increases the recognition accuracy compared with the traditional ST-DBN model.
Keywords/Search Tags:action recognition, CRBM, TS-DBN, 3D Gabor, pooling, sparse coding
PDF Full Text Request
Related items