Font Size: a A A

Human Motion Analysis Algorithms In Complicated Surveillance Environments

Posted on:2018-01-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:L ZhaoFull Text:PDF
GTID:1368330542992930Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Video surveillance,also known as closed-circuit television,has been prevalent in public places,commercial buildings and private houses.These widespread cameras have pro-found implications for crime prevention,employee monitoring and household safety.With such massive amount of cameras and recorded videos,artificial intelligence based automat-ic video surveillance technologies remain imperative,which has been highly concerned by researchers and engineers.Among these technologies,human motion analysis wants to help automatically answer "Who is he?" and "What is he doing?" these two core requisites of intelligent video surveillance.Though this technology has made considerable progress these years,it still has several problems need further research.Relevant existing human motion analysis technologies are only applicable in controlled surveil-lance environments,which has strict confines on filming views and human activities.How-ever,videos recorded in public places contain various shooting views,complex and unpre-dicted human behaviours and cluttered backgrounds.Considering these characteristics,it is essential to study helpful methods for clearer distinguishing human face and better under-standing human actions.For this motivation,this thesis is dedicated to developing new methods for face and human pose analysis in complexed video surveillance environments.First,for the purpose of dis-cerning human face clearer,it studies the problem of frontal face synthesis.Then,for the purpose of understanding human actions better,it studies the problem of human pose esti-mation and tracking.The main contributions are summarized as follows:1.In most surveillance environments human faces are non-frontal,a frontal face image synthesis method is proposed based on triangulation and sparse representation.Existing rectangular image partition criterion fails to align corresponding patches in profile images and frontal face images.Given an arbitrary profile image,to synthesize a corresponding frontal face image which is smooth in texture and similar in appearance,we introduce a triangulation-based partition criterion and do synthesis based on sparse representation.The triangulation-based partition ensures the corresponding triangular patches are strictly aligned.And sparse representation adaptively finds the most similar patches for synthesis while aban-dons unlike patches.Further more,a confederate learning strategy is proposed to reduce the blocking artifacts caused by triangulation-based partition.Experimental results demonstrate the effectiveness of the proposed frontal face image synthesis method and advantages over previous works.2.Complicated video surveillance environments always contain cluttered backgrounds and complex human actions,a hierarchical pictorial structure is proposed for human pose esti-mation.Typical approaches for this problem just utilize a single level structure,which is difficult to capture various body appearances and hard to model high-order part dependen-cies.To this end,we build a three layer Markov network to model the body structure that separates the whole body to poselets(combined parts)then to parts representing joints,which can detect body parts more accurately.In this hierarchical model,parts at different levels are connected through a parent-child relationship to represent high-order spatial relationships.Moreover,our model is a tree structure,which can be trained jointly and favours exact in-ference.Extensive experimental results show the performance of our model improving or being on-par with state-of-the-art approaches.3.In video surveillance environments,people may wear different kinds of clothing and may move very quickly and unpredictably.Considering these characteristics,a method based on tracking and estimation integrated graphical model is proposed for human pose tracking.The technology of pose estimation is typically applied for tracking human pose in videos,but it ignores the temporal context and cannot provide smooth,reliable tracking results.Therefore,we develop a tracking and estimation integrated model(TEIM)to fully exploit temporal in-formation by integrating pose estimation with visual tracking.Algorithmically,we design TEIM very carefully so that it(1)enables pose estimation and visual tracking to compensate for each other to achieve desirable tracking results,(2)is able to deal with the problem of tracking loss,and(3)only needs past information and is capable of tracking online.Com-prehensive experimental results indicate the effectiveness of the proposed TEIM framework.4.For tracking human pose in complicated video surveillance environments,a method based on max-margin Markov models is further proposed.The problem of pose tracking can be modeled by a discrete Markov random field,but tracking human pose needs to couple limbs in adjacent frames,the model will introduce loops and will be intractable for learning and inference.Previous work has resorted to approximate inference strategies,which can overfit to statistics of a particular dataset.Thus,the performance and generalization of these meth-ods are limited.We approximate the full model by introducing an ensemble of two tree-structured sub-models,Markov networks for spatial parsing and Markov chains for temporal parsing.Both models can be trained jointly using the max-margin technique,thus strong generalization ability can be guaranteed.Moreover,an iterative parsing process is proposed to achieve the ensemble inference.Thorough experimental results demonstrate the superior performance of our method over state-of-the-art approaches.
Keywords/Search Tags:face synthesis, pose estimation, pictorial structure, pose tracking, Markov random fields, max-margin
PDF Full Text Request
Related items