Font Size: a A A

Spatial-Temporal Features For Human Action Recognition Based On A Hierarchical Model

Posted on:2015-02-15Degree:MasterType:Thesis
Country:ChinaCandidate:J W WuFull Text:PDF
GTID:2268330428480407Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The research and application of computer vision make continuous advances, pushed by the enormous growth of image and video data, and the desire of granting computer the ability of human visual system. It’s a hot research area in computer vision by processing, analyzing, and understanding the video through computer. While human action recognition is one major research direction in the field of computer vision.Human action recognition determines the human behavior category by analyzing the related features of human action extracted from the video sequence, which involves many subjects such as image processing, artificial intelligence, pattern recognition and so on. The study of human action recognition has various application value and prospects in civilian and military fields such as anti-terrorism, public security, human-machine interaction, driver-assist, virtual reality, video retrieval, etc. The critical technology on human action recognition is concentrated on feature extraction and behavior classification.This dissertation launches the research around the related problems of human action recognition. Firstly, spatial-temporal interest points are extracted by the method based on video orthogonal planes. Secondly, we extract spatial-temporal feature descriptors to represent human action. Eventually, the hierarchical representation generated by the hierarchical bag-of-words model is inputted to classifier to recognize human action. The main works of the dissertation are as follows:On the basis of existing method, we improve the local spatial-temporal feature extraction algorithm from two perspectives of local spatial-temporal interest point extraction and local spatial-temporal descriptor by introducing the concept of video orthogonal planes in this dissertation. In this new method of spatial-temporal interest point extraction, the entire video sequence is considered as a three-dimensional volume, in which interest points are detected on planes along each dimension, then spatial-temporal feature points are obtained by means of set operation and a weak constraint rule. The spatial-temporal descriptors are generated by integrating histogram of oriented gradients and histogram of optical flow in cuboids centered on interest point. Among this process, cube is gained by symmetrical extension, not by partitioning equally, and histograms in cube are integrated through weighted function. In the presented method, low dimensional representation of high dimensional feature representation is generated by locality preserving projection algorithm (LPP), because the low dimensional representation based on LPP can preserves local information.By introducing the concept of feature pools, we construct a hierarchical bag-of-words model, form a one-in-multi-out mechanism, then it can output multi-level representation of human action, and the multi-level representations can describe human action different grains, thus both structural and local information are taken account of.Experiments are carried out in common standard video datasets, and the feasibility and validity of the proposed method in this thesis is verified in experiments. Experimental results show that spatial-temporal interest points extracted with proposed method possess better stability and representativeness, and spatial-temporal feature descriptors extracted with presented method can capture appearance and motion information well. Especially, the action representation based on the hierarchical model gains better recognition results.
Keywords/Search Tags:Spatial-Temporal Feature, Hierarchical Bag-of-Words Model, Human Action Recognition, Orthogonal Plane Interest Point
PDF Full Text Request
Related items