Font Size: a A A

Semantic Understanding-Aware Neural Network Architecture Design For Video Surveillance

Posted on:2020-09-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:S Y HuangFull Text:PDF
GTID:1368330578473941Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Video surveillance systems play an important role in many areas including public safety and urban management.Recent years have witnessed a rapid development of deep learning techniques.The intelligent video surveillance systems are also benefited much from the deep neural network by taking advantage of its powerful feature representation capability and the end-to-end learning scheme.In combining deep learning and intelligent video surveillance,a key issue is how to design effective,robust and reliable neural network architectures.In this thesis,we systematically study various aspects of neural network architecture design in surveillance analysis.We design specific neural network architectures for spatio-temporal and multi-modal video semantic understanding,so as to mine,model,and fuse the abundant semantic information in surveillance videos.We also explore the automated neural network architecture design.Over the course of the study,we pro-pose a series of innovative algorithms and demonstrate the effectiveness of these algorithms by experiments.The main contributions of this thesis are summarized as follows:1.We study the modeling and fusion of temporal and spatial semantic information in surveil-lance videos.We propose novel neural network models for modeling the temporal semantic of objects and spatial semantic of scenes,respectively,and apply these models to object trajectory prediction problem.We further study the joint learning of temporal and spatial semantic informa-tion in videos,where a hierarchical cascaded spatio-temporal network model is novelly proposed.We apply the model to video summarization problem to demonstrate its high-level semantic under-standing capability.2.We study the mining and joint learning of multi-modal semantic information in surveillance videos.We propose two novel multi-modal scene semantic models under the context of pedestrian semantic analysis to recover sufficient semantic information from surveillance scene images.We further apply the scene models to crowd counting task by incorporating the multi-modal semantic information into deep neural networks,enabling a robust estimation of the number of dense crowds.3.We study the automated neural network architecture design,and propose a neural architec-ture search method for searching the tree-structured neural network topology.In a greedy manner,our method divides the optimization of global architecture into the optimizations of local archi-tectures and addresses the problem efficiently by iterative updating.The derived tree-structured architecture effectively models the relationships between attributes,such that our method can be applied to various multi-attribute learning problems.
Keywords/Search Tags:Intelligent video surveillance, neural network, spatio-temporal semantic modeling, multi-modal information mining, neural architecture search
PDF Full Text Request
Related items