Font Size: a A A

Depth-information Based Human Action Recognition

Posted on:2016-11-15Degree:MasterType:Thesis
Country:ChinaCandidate:J X ZhaoFull Text:PDF
GTID:2308330476953321Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Human action recognition is an important yet challenging task for computer vision. Due to various applications in domains such as visual surveillance, virtual reality and human–computer interaction, it has been paid more and more attention by lots of researchers and engineers.Traditional models of human action recognition rely on RGB pixels.Suffering from the distraction of background and lighting, most of these models cannot be easily applied in engineering. The recently developed commercial depth sensors bring new possibilities to tackle this problem.This paper focuses on depth-information based human action recognition. Based on the coordinates of human skeleton, we propose one framework of human action recognition, which include two parts:(1)feature extraction based on depth information(2) classi?er construction. In ?rst part, we ?nd that the raw data captured by depth camera can only describe human pose,although it is one higher-level feature and has real physical interpretation compared with RGB pixels. The extracted feature should re?ect movements of human body and temporal changes since human action is one kind of time series which describe body movement. To extract the feature of human motion, we construct a better descriptor which makes use of human pose,velocity and acceleration of the skeleton joints. In second part, we introduce one variant of ”Spatial Pyramid Matching”,one classical model in image classi?cation, the temporal pyramid matching to describe the temporal variation in different scales.Two most important contributions in this paper are proposed during classi?er construction,those are(1)key-frame extraction based on multiple instance learning, and(2)latent struct support vector machine which brings the temporal features as latent variables. Firstly, we observe that all frames of one action video cannot be assigned to one determined label reasonably because different actions will share a lot of states.What’s more, we can recognize one kind of human action through a small number of key frames. That means key frames are important and necessary for human action recognition. So in this part we propose one iterative SVM algorithm based on multiple instance learning, which can extract key frames iteratively. Secondly, when we extract temporal feature on each frame, the length of temporal window(observed window)need to be decided. Usually, this window size parameter is ?xed as one constant and decided by cross validation in other models. Likewise,in the traditional temporal pyramid matching, the time domain is split equally in each level, which is also one kind of constant. However, considered that these parameters actually re?ect the motion state in each frame, it’s unreasonable to take these parameters as constants for all frames.In this paper, we construct one latent struct SVM which considers temporal window length and temporal split as latent variables. Through learning optimized latent variables for every frame, we have constructed more discriminative features.Finally, the offine and online experiments both show that the proposed model in this paper can achieve superior performance to the state-of-art models.
Keywords/Search Tags:Action Recognition, Depth Information, Temporal Pyramid Matching, Multiple Instance Learning, Latent Struct SVM
PDF Full Text Request
Related items