Font Size: a A A

Research On Some Problems Of Human Action Recognition In Videos

Posted on:2017-01-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:L S PeiFull Text:PDF
GTID:1108330485988396Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Action recognition is a popular and important research direction in the fields of computer vision, machine learning, artificial intelligence and so on. It analyzes and recognizes human actions from images and video data, and its research achievements have practical applications in security monitoring, patient and disabled person guardianship, multimedia content understanding, human-computer interaction, virtual reality, etc.However, current action recognition technologies have a lot of limitations in actual applications. Based on actual demands, aiming at the following four questions about human action recognition, this thesis carries out the action recognition research. 1) In certain situations, while action samples are very difficult to collect, how to effectively recognize specific actions using few samples. 2) In complex but the action performer can be detected situations, how to effectively recognize specific actions. 3) In complex but the action performer can be detected situations, how to fast and effectively recognize multiclass actions. 4) In complex situations that the action performer cannot be located, how to effectively recognize multi-class actions.This thesis proceeding from the practical problems, taking pattern recognition, machine learning and other theories as basis, carries out a series of innovative research, and proposes the solutions of the above four problems. The main research work and contributions of this thesis are as follows:1) Based on Hough Space voting, a global action representation approach which is displacement histogram sequence representation is proposed. At first, this approach coarsely estimates motion regions of the action videos. Then, based on the matched interest points of continuous image frames in the motion regions, it uses a two dimensional displacement histogram to represent the human movement information of the continuous image frames. At last, while the action is represented as displacement histogram sequence, matrix cosine similarity metric is used to recognize actions. For recognized actions, the matched interest points precisely locate its spatial and temporal locations.Experiment results demonstrate that, in static situations or uniform backgrounds, the proposed method can effectively recognizes and detects specific actions. In addition, such from coarse to fine action localization method effectively speeds up action representa-tion. This approach solves the specific action recognition and detection problems with few action samples.2) For human action, a method that learning spatial temporal features from a new viewpoint is proposed. At first, this approach detects and tracks the action performer.Based on the detection and tracking results, the temporal shape features of human body parts are encoded as spatial temporal features by multiple Restricted Bolzmann Machines(RBM). Then, using a Restricted Bolzmann Machine neural network, the spatial temporal feature codes are integrated as a global spatial temporal feature of the action video. At last, the trained support vector machine classifiers are used to recognize actions. A lot of experiments validate the effectiveness of the proposed approach. This method that extracting spatial temporal features from the shape feature sequence of human body parts,opens up a new viewpoint for human action feature extraction. This method solves specific action recognition problem in more complex situations.3) A fast multi-class action recognition algorithm based on inverted index is proposed. At first, based on the detection and tracking results of the action performer, the shape and motion features are extracted from the interest regions of the human action.With those features, an action state binary tree is built using the hierarchical clustering method. Using the binary tree, human action can be quickly represented as action state sequence. Then, for each action state sequence, by searching the created action state inverted index table and the action state transition inverted index table, two score vectors that corresponding to all of the action categories are computed. At last, we recognize actions using the weighted score vectors. Experiments demonstrate that, the proposed method can quickly recognizes multi-class actions. The application of the action state binary tree speeds up the representation of action state sequence for action video. The usage of inverted index table obviously increases the multi-class action recognition speed.This approach solves the fast multi-class action recognition problem in more complex situations.4) Based on independent subspace analysis network, an approach that using the learned temporal slowness invariant spatial features to encode spatial temporal features for video actions is proposed. At first, this approach using the regularization terms constrained independent subspace analysis network, learns a group of temporal slowness invariant spatial features. For the extracted spatial features of each sampled video cuboid,the pooling process in the spatial and the temporal domain makes the output local spatialtemporal features effective in action recognition. Then, the extracted local spatial temporal features are organized by Bag-Of-Feature model to represent video actions. At last,nonlinear support vector machine classifiers are used to recognize multi-class actions.Experiment results demonstrate that, the usages of the temporal slowness invariant regularization and the denoising criterion, make the learned spatial features and the extracted local spatial temporal features robust to chaos backgrounds, occlusion, and so on. This approach solves multi-class action recognition problems in complex situations.
Keywords/Search Tags:Action Recognition, Action Detection, Spatial Temporal Features, Deep Learning, Inverted Index Tables
PDF Full Text Request
Related items