Font Size: a A A

Research On The Local Spatio-Temporal Relationships Based Feature Model For Action Recognition

Posted on:2017-02-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:T C ZhouFull Text:PDF
GTID:1108330491463034Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Video human action recognition is one of the hot researches in the field of Pattern Recognition and Computer Vision. It has broad application prospects in areas such as intelligent monitoring, human-computer interaction, the abnormal behavior detection, video retrieval and so on. The aim of action recognition study is that existing computer technology is used to enable the machine having the ability to identify, analyze, understand and predict human action like human beings. Although human action recognition technology based on video has made considerable progress and development, people still are facing some problems, such as how to efficiently and accurately capture, describe local features, learn their relationships, fuse features, model actions and so on various aspects.In order to solve the above problems, we begin our study on the approaches of human action recognition based on the local features. Generally, the main research contents and innovative work in this dissertation can be summarized as follows:1) Effective local feature and description. Due to the imaging equipment movement, the extracted local features by tracking the interest points contain the unrelated information coming from the movement background. Meanwhile, orientation configuration based on one directional pattern leads to quantization error. In order to deal with this problem, the methods which are tracking the relative motion points and quantizing the trajectory orientation with multiple directional patterns are proposed to model actions. To select the valid trajectory starting points, super-pixel segmentation and motion boundary detector that can suppress the camera constant motion are adopted. For the trajectory shape, the pre-defined multiple directional patterns are employed to produce distribution statistics of direction of trajectory displacement. On K.TH and UCF-sports datasets, the extracted trajectories can describe the changes of motion objects, and the directional statistics with multiple pattern boost the robustness to the trajectory shapes. Compared with the related literature, the extracted trajectory features by tracking the selected interest points obtain good recognition performance under the Mutilple Kernel Learning (MKL) framework.2) Sparse coding based on the hierarchically tree-structured dictionary. Sparse coding can adaptively represent one signal. However, the similarity among the signals is lost due to lack of the correlations between the atoms. Considering the robustness of the structured sparse representation, the tree-structured dictionary is proposed to encode the local features. In a hierarchical way, our method learns multiple subdicrionaries and builds the relations between the atmos in the upper and lower layer. Specifically, with the standard dictionary learning algorithm, the convex optimization is introduced a constraint of programming data point code path that passes on the index from the upper layer to the next layer. Experimental results on KTH action database show that the descriptor encoding with the learnt tree-structured dictionary has good robustness, and our algorithm generally obtains higher recognition accuracy than other similar literature’s methods. Tree-structured model learnt by our method for action recognition is superior to the standard sparse representation.3) Action model based on a hierarchy of feature groups. The spatio-temporal relationships to describe the action prototypes have shown great promise in the field of complex human action recognition. However, the learnt compound features such as aggregate statistics, feature pairwise, etc. which adopt the unstable spatio-temporal interest points (STIPs) with Euclidean metric, may be lack of the semantic meanings and robustness to intra-class variability. To tackle this problem, considering the hierarchical structure of human motion, an action model which is a hierarchy of feature groups is proposed. To suppress the motion scene information, motion compensation and human body parts properties are introduced. To generate the features belonging to one human body, the adaptive scale kernel density clustering algorithm is used to label local features. Specifically, after motion compensation, the residual information generated by temporal difference is selected according to the spatial and temporal properties of human body moving parts, then Mean-Shift clustering with the adaptive scale kernel is used to label the residual information and generate part-based representation. Under the part-based descriptor, the visual word responses are accumulated to describe the narrow video clips. On the benchmark KTH and UCF-sports action datasets, the experiments show that action model built by the feature groups enhances discriminative power of action representation, and improves recognition performance.4) Action model based on feature-tree of human body part. The spatio-temporal context learnt by the traditional methods for action recognition lacks the semantic meanings and temporal relationships. In order to deal with the drawbacks, a novel semantic context feature-tree model is proposed to model the relations among the body parts with different temporal resolutions. In our method, Super-pixel is used to label the extracted STIPs and generate the point sets that are spatial semantic co-occurrence. To build the feature-tree, the nodes that are temporal nearest neighbor are fused by the patch matching in a recursive way. The co-occurrence domain to represent body part that is constructed by super-pixel has good flexibility. In addition, the approach that fuses the temporal nearest neighborhood nodes by image matching reduces the difficulty of matching the point sets. Using KTH, UCF-YT and HOHA datasets for human action recognition, experimental results show that the learnt feature-tree model can model the relationships among the body parts, enhances the discriminative power of action descriptor, and obtains promising results under the MKL framework.5) Action model based on discriminative local co-occurrence of concept features pairwise. Quantizing local feature by k-means is easy to generate large quantization error. Meanwhile, co-occurrence statistics learnt by the traditional method ignore the direction and relative distance of feature pairwise. To deal with this problem, a new action model that employs the concept feature pairwise to bulid discriminative co-occurrence statistics is proposed. To learn the concept features, sparse subspace clustering that introduces the local manifold constraint is used to quantize the STIPs. In order to enhance the discriminative power of action units, the direction and relative distance of the feature pairwise are embedded into the co-occurrence statistics. In addition, considering the diversity of human action style, spatio-temporal volumes with multiple temporal resolutions are used to model action prototypes. Using the popular KTH and UCF-sports datasets, experimental results show that the multi channel co-occurrence statistics and STIPs fused by MKL obtain promising recognition performance compared with other state-of-the-art algorithms. Meanwhile, compared to point-based action representation, the co-occurrence statistics of the concept feature pairwise have good discriminative power.
Keywords/Search Tags:action recognition, local features, spatio-temporal relatinships, sparse coding, feature fusion, Multiple Kernel Learning
PDF Full Text Request
Related items