Font Size: a A A

Research On Human Action Recognition Based On Temporal And Spatial Characteristics And Deep Learning

Posted on:2017-08-27Degree:MasterType:Thesis
Country:ChinaCandidate:X Y LiFull Text:PDF
GTID:2348330533950311Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
The research on human action recognition in video is extreme hot. Human action recognition is divided into two steps: feature extraction descriptor and feature classification. Feature extraction descriptor is to describe the human action using the feature extraction. But the problem is how to extract the feature of human action accurately under complex underground. Meanwhile, feature classification is to classify the feature descriptor and decide the different kinds of action. However, The main problems the design and choice of the proper classifier. To solve the above problem, we present two approaches to human action recognition, the main research works of this thesis are as follows:(1) The human action recognition method based on spatial-temporal interest points and HOG-3D descriptors is proposed in this thesis. Firstly, collecting the dense spatial-temporal interest points based on the grayscale video frames. Secondly, the HOG-3D descriptors of dense spatial-temporal interest points are constructed fro m collected grayscale video frames. Thirdly, the HOG-3D descriptors is established based on grayscale video frames and spatial-temporal interest points. Finally, the establishment of word bags model by the K-means clustering algorithm, the histograms of every video features are constructed and the method of support vector mac hine is used for human action recognition and classification.(2) The human action recognition method based on convolutional neural networks and spatial-temporal interest points. Firstly, collecting the dense spatial-temporal interest points based on the grayscale video frames. Secondly, the spatial-temporal interest points of the whole video frame are mixed as one image. Finally, the image is considered as the convolutional neural networks input and the artificial label in convolutional neural networks is used to make a classification.The KTH dataset characteristics of this kind video are simple shooting environment, uniform conditions and simple of human action. The Hollywood2 dataset is more complex than the KTH dataset. This kind of video shot is usually close to the life scene, and there is jitter during the shooting process. Our experiments show that using the two methods can achieve a higher recognition rate and has strong robustness.
Keywords/Search Tags:Spatio-Temporal Interest Points, HOG-3D, SVM, Fusion Interest Points, Convolutional Neural Networks
PDF Full Text Request
Related items