Font Size: a A A

Semantic Spatial-Structure-Aware Object Understanding For Video Surveillance

Posted on:2019-07-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y Q ZhangFull Text:PDF
GTID:1368330572967311Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Video surveillance systems play a significant role in many areas including national security and public social security.Recent years have witnessed an exponential growth in the amount of video surveillance data with the development of computing and video surveillance techniques.However,the unstructured form of the video content is a bottleneck that hinders the large-scale application of video surveillance systems.How to effectively extract valuable information from massive video surveillance data and accurately represent and store the data is a key issue that needs to be solved urgently in current intelligent video surveillance systems.The key issue of video content structural analysis is to capture the comprehensive and fine-grained information of each component and their relations of the target object,so as to assist the computer in understanding and organizing the video content effectively.To address this issue,this thesis goes deep into understanding inherent spatial structures of the target object as well as associating these structures.We introduce a research framework called "spatial-structure semantic perception".Under this framework,we systematically study the related technologies in structure representation of the target in the video surveillance and propose effective and practical algorithms.Experimental results demonstrate the effectiveness of these algorithms.Our main contributions in the thesis are summarized as follows:1.We clarify the key technologies and fundamental issues in object understanding of the intelligent video surveillance systems.In addition,we analyze the possible technical approaches and basic methods that need to be taken to address these issues.The research framework of "spatial-structure semantic perception" for fine-grained object understanding,matching,and searching is proposed.2.Based on this research framework,we systematically study the multi-person pose estimation problem in video surveillance.We claim that the key issue of tackling the challenges including pose variations and crowd is to fully capture the global structure and local context information.In this thesis.we propose a task-specific,multi-scale U-shape pose estimation network from the perception and correlation of low-,middle-,and high-level structures,and achieves stable and efficient multi-person person estimation system.3.We comprehensively study the cross-camera person re-identification problem in video surveillance systems and analyze the differences and challenges between person re-identification and traditional target recognition problem.We point out that to achieve robust person re-identification in the case of unaligned paired images,it is necessary to fully exploit the middle and high level in-herent spatial structures and their relations between paired images to perform the one-to-one com-parison.4.We further design and implement an efficient person search system that supports the spatial-structre semantic perception framework.We employ "Knowledge Distillation"to introduce a set of deep and well-engineered expert networks to jointly handle object detection and feature repre-sentation in a unified framework,so as to ensure the accuracy and computing efficiency that meet the requirement of the real-time video surveillance systems.
Keywords/Search Tags:Intelligent video surveillance, structure representation, deep learning, pose estimation, person re-identification
PDF Full Text Request
Related items