Semantic Understanding-Aware Neural Network Architecture Design For Video Surveillance

Posted on:2020-09-14

Degree:Doctor

Type:Dissertation

Country:China

Candidate:S Y Huang

Full Text:PDF

GTID:1368330578473941

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

Video surveillance systems play an important role in many areas including public safety and urban management.Recent years have witnessed a rapid development of deep learning techniques.The intelligent video surveillance systems are also benefited much from the deep neural network by taking advantage of its powerful feature representation capability and the end-to-end learning scheme.In combining deep learning and intelligent video surveillance,a key issue is how to design effective,robust and reliable neural network architectures.In this thesis,we systematically study various aspects of neural network architecture design in surveillance analysis.We design specific neural network architectures for spatio-temporal and multi-modal video semantic understanding,so as to mine,model,and fuse the abundant semantic information in surveillance videos.We also explore the automated neural network architecture design.Over the course of the study,we pro-pose a series of innovative algorithms and demonstrate the effectiveness of these algorithms by experiments.The main contributions of this thesis are summarized as follows:1.We study the modeling and fusion of temporal and spatial semantic information in surveil-lance videos.We propose novel neural network models for modeling the temporal semantic of objects and spatial semantic of scenes,respectively,and apply these models to object trajectory prediction problem.We further study the joint learning of temporal and spatial semantic informa-tion in videos,where a hierarchical cascaded spatio-temporal network model is novelly proposed.We apply the model to video summarization problem to demonstrate its high-level semantic under-standing capability.2.We study the mining and joint learning of multi-modal semantic information in surveillance videos.We propose two novel multi-modal scene semantic models under the context of pedestrian semantic analysis to recover sufficient semantic information from surveillance scene images.We further apply the scene models to crowd counting task by incorporating the multi-modal semantic information into deep neural networks,enabling a robust estimation of the number of dense crowds.3.We study the automated neural network architecture design,and propose a neural architec-ture search method for searching the tree-structured neural network topology.In a greedy manner,our method divides the optimization of global architecture into the optimizations of local archi-tectures and addresses the problem efficiently by iterative updating.The derived tree-structured architecture effectively models the relationships between attributes,such that our method can be applied to various multi-attribute learning problems.

Keywords/Search Tags:

Intelligent video surveillance, neural network, spatio-temporal semantic modeling, multi-modal information mining, neural architecture search

PDF Full Text Request

Related items

1	Research On Spatio-temporal Correlation Feature Extraction And Recognition Of Multi-modal Tactile Signals
2	Research On Surveillance Video Synopsis Based On Spatio-Temporal Slice
3	Research And Implementation Of Semantic Association Spatio-Temporal Mining
4	Video Spatio-Temporal Representation Learning Methods
5	Research Of Intelligent Video Surveillance Technology In Family Environment
6	Research On Video Anomaly Detection Algorithm Based On Enhanced Spatio-temporal Features
7	Research On Video Event Recognition Using Deep Network Spatio-temporal Consistency
8	Spatio-temporal Data Mining Based Active Surveillance System Of Infectious Diseases
9	Research On Video Behavior Classification Technology Based On Spatio-Temporal Features
10	Action Recognition Method Based On Multi-frequency Spatio-temporal Feature Learning