Font Size: a A A

Spatial and temporal modelling for automatic human behavioral analysis

Posted on:2017-12-16Degree:Ph.DType:Thesis
University:The Ohio State UniversityCandidate:Zhao, RuiqiFull Text:PDF
GTID:2478390017952792Subject:Electrical engineering
Abstract/Summary:
Human behavioral analysis from video is a topic of major interest in computer vision. It plays key role in a wide range of human-centric applications. A working technique involves processing both static visual cues and temporal dynamics of the entire sequence. In this thesis, we study two tasks for automatic human behavioral analysis from video: recovering 3D shape from 2D landmarks on a single image and action recognition from video. We propose efficient and accurate methods for these tasks.;Estimating 3D shape from 2D landmarks not only provides richer information about objects such as human and face but also benefits other higher level tasks. It is a difficult and ill-posed problem. Previous approaches represent shape deformation using a linear combination of learned 3D shape bases. The linear assumption limits their applicability on human pose which is highly deformable and articulated. With the available 3D ground-truth of a number of 2D samples, we propose a deep network to directly estimate the mapping function from 2D landmark points to their depth. The 3D shape is recovered by combing the input and output of the neural network and is up to a scaling factor. The system is robust to noise and missing data. It runs at > 1,000 fps during testing and outperforms state-of-the-art by up to two-fold.;Recognizing action from video is an important step towards automatic human behavioral analysis. It is a very difficult problem because of many challenges including variation of shape, appearance, lighting, camera view and scale. However, novel cost-effective depth sensors and robust human pose estimation techniques enable accurate prediction of human skeletons. They can better deal with the above mentioned challenges. This motivates us to develop a method that can model spatial and temporal dynamics of body part movements. We achieve this by first segmenting an action into basic components called events. We further represent events and their temporal structures using labeled graph. We also propose a path based graph kernel to compute graph similarity. The graph kernel can be plugged into any kernel based classifier for classification. A natural choice is Support Vector Machine. The derived approach is not only efficient but also interpretable. We evaluate it on extensive databases and show significant improvement over state-of-the-art.
Keywords/Search Tags:Human, Behavioral analysis, 3D shape, Temporal, Video
Related items