Font Size: a A A

Research Of Human Action Recognition Based On Video

Posted on:2014-02-02Degree:MasterType:Thesis
Country:ChinaCandidate:J Q LiuFull Text:PDF
GTID:2248330398961377Subject:Digital media technology and the arts
Abstract/Summary:PDF Full Text Request
Recent years, action recognition has become a hot issue, it has a wide range of applications and potential economic value in the field of intelligent surveillance, human-computer interaction, video indexing. The main task of action recognition is to process and analysis the original image sequences, learn and understand human action or behavior, establish the correspondence between low-level visual features and high-level semantic information.There are two main problem in action recognition:action representation and action classification. The action representation aims to describe the action by extracting effective feature from the video, and action classification designs appropriate classification model according to extracted features. Depending on the difference of action representation, we structure existing works into three categories: human model based methods, global feature methods, local features methods. Among them, local features methods are more popular in recent years, and have achieved excellent results on the multiple human action databases.Video feature extraction and description is a crucial step in human action recognition, it has an important impact on the result of the action recognition. Firstly we detailed the existed trajectory extraction methods, pointed out their innovation and inadequacies. On this basis, we then proposed a new trajectory extraction method, which can reflect the motion information accurately. To encode shape and motion information of a given trajectory, we extract three kinds of descriptors surrounding its local neighborhood:histograms of oriented gradients, histograms of optical flow, motion boundary histograms.In many cases, human actions can be identified not only by observing human body in motion, but also properties of the surrounding scene, it can give us related contextual information about the action taking place. In our work, we address the problem and model the scene of the action takes place using Gist feature.We also introduced the idea of bag of words model, it represents the video as a collection of visual words. However, the model usually ignores the temporal and spatial relationship between local features, so we divide the video sequence into space-time grid to embed structural information, the final histogram is obtained by concatenating histograms of all cells in the grid. In order to achieve the best classification results, we integrate these two types of features using the idea of multiple kernel learning.Recent papers report high recognition accuracies for actions in controlled settings. At the same time, action recognition remains a very challenging problem in realistic settings of TV broadcasts, movies or surveillance videos as. In order to verify the effectiveness and feasibility of the proposed algorithm, we run our experiments on four datasets of varying complexity-basic Weizmann, realistic UCF and challenging YouTube and Hollywood. Experiment results demonstrate that proposed method can well adapted to complex environment such as camera movement, illumination changes and different clothing in realistic settings and achieve better recognition performance.
Keywords/Search Tags:action recognition, trajectory, scene, bag-of-words, multiple kernellearning
PDF Full Text Request
Related items