Font Size: a A A

Research On The Algorithms Of Video-Based Human Action Analysis

Posted on:2015-11-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:X J PengFull Text:PDF
GTID:1228330461474261Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Understanding the meaning of an action is an essential aspect of human social communication. With the development of computer hardware and software, we are able to analyze some visual human actions automatically. Currently, video-based human action analysis has been a hot topic in computer vision and multimedia communities because of its potential applications in human-computer interaction, smart surveillance and content-based video retrieval. However, analyzing actions in videos is a very challenging task due to the imprecise definition of actions and the variation of action speed, view point and background.The key problem of video-based human action analysis is how to represent the action videos effectively. Though many works have been studied video representation, there is still no effective solution for human action analysis. Generally, human action analysis contains three tasks, namely action recognition or action classification, action detection and action verification or action similarity labeling. In this thesis, we mainly study action recognition and action similarity labeling with the problem of video representation. The main contents are as follows.Firstly, we focus on the extraction and description of local regions in videos. We propose a dense trajectory sampling strategy to extract local features based on motion boundary. Compared with classical dense trajectory sampling, ours is able to remove amounts of trajectories on the background and preserve most of the trajectories from actors. Three kind of spatial-temporal/3D co-occurrence descriptors are presented to describe the trajectories, namely 3D-CoHOG,3D-CoHOF and 3D-CoMBH. Moreover, we propose three multi-channel decomposition schemes for these 3D co-occurrence descriptors which significantly improve the performance.Secondly, we study some mid-level models for video representation. We work with mid-level models from three aspects. First, we provide a comprehensive study for the popular Bag of Visual Words model (BoVW) and multiple feature fusion. Specially, all steps in the BoVW are studied, namely feature extraction, feature preprocessing, dictionary generation, feature coding, pooling and normalization. We also explore three kinds of feature fusion methods, namely descriptor level fusion, representation level fusion, and score level fusion. Second, we propose a supervised supervector feature coding method for human action recognition. This approach introduces supervised dictionary for supervector coding method based on BoVW, and can boost the action classification accuracy. Finally, we propose a stacked Fisher vector feature coding method for action recognition. Inspired by deep neural networks, we utilize two Fisher vector layers based on BoVW and introduce a supervised dimensionality reduction method for the first layer. Experimental results have shown the effectiveness of this method.Finally, we study the action similarity labelling task based on supervised dimensionality reduction. Specially, single video is represented by BoVW using the state-of-the-art supervector feature coding method, and then a large margin dimensionality reduction method is proposed to reduce the dimension of video representation, and finally the representation of video pairs is fed into the SVM classifier. Experimental results have shown that our method can improve the performance of action similarity labelling.
Keywords/Search Tags:Human action analysis, human action recognition, action similarity labeling, bag of visual words model, feature coding, dense trajectories, dictionary learning, large margin dimensionality reduction
PDF Full Text Request
Related items