Font Size: a A A

Intelligent Video Analysis with Deep Learnin

Posted on:2018-02-22Degree:Ph.DType:Thesis
University:The Chinese University of Hong Kong (Hong Kong)Candidate:Kang, KaiFull Text:PDF
GTID:2448390002498146Subject:Computer Science
Abstract/Summary:
Intelligent video analysis has drawn an increasing attention from the public and authorities for the purposes of video surveillance and on-line video understanding. Semantic segmentation and object detection have been the fundamental tasks in computer vision and have been widely studied in the image domain. In video domain, however, there still remain open questions in both semantic segmentation and object detection.;With the development of deep learning and the advent of large-scale datasets, new technologies are proposed to address these fundamental problems in video analysis. In this thesis, four works will be discussed including a novel fully-convolutional neural network structure for efficient human crowd segmentation with applications in video surveillance, a video object detection framework that integrates still-image detector and generic object tracker, a winning framework for the ImageNet video object detection challenge, and a novel tubelet proposal network that efficiently generates spatiotemporal proposals for object detection in videos.;New neural network structures are proposed specifically for the video applications. Multi-branch architectures are proposed to incorporate both appearance and motion cures. 1-D fully convolutional network is proposed to learn temporal features from tubelet features. In addition, an encoder-decoder structure is applied to tubelet proposals to fully encapsulate tubelet appearance features.;In addition, various cues in the video domain are investigated and utilized for intelligent video analysis. For example, appearance cues are learnt from fully-convolutional neural networks, perspective maps are utilized to modulate convolution kernels to incorporate scale cues, long short-term memory (LSTM) and 1-D convolutional networks are applied to tubelet proposals to learn temporal cues, and context cues are models as background subtraction inputs in segmentation or optical flow propagation in object detection.
Keywords/Search Tags:Video, Object detection, Cues, Segmentation
Related items