Spatial-temporal Context And Temporal Scheduler For Convolutional Neural Network Based Video Object Detection

Posted on:2020-06-06

Degree:Master

Type:Thesis

Country:China

Candidate:H Luo

Full Text:PDF

GTID:2428330599959585

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

Video object detection task includes object localization and object classification,which is a fundamental task in computer vision.In realistic life,there are a mass of applications for video object detection,e.g.autonomous driving,video surveillance and intelligent city.Recent cutting-edge feature aggregation paradigms for video object detection rely on inferring feature correspondence.The feature correspondence estimation problem is fundamentally difficult due to poor image quality,e.g.,motion blur,video defocus and object occlusion.Accordingly,the results of feature correspondence estimation are usually unstable.Besides,when applying video object detection algorithm to application in actual scene,there is high request about speed and performance because of the limited computation capacity.Most state-of-the-art video object detection algorithms only take the recognition accuracy into consideration.To handle with these problems,we propose two solutions in terms of accuracy and speed,respectively.Specifically,the main contributions of this paper are as follows:1.Based on spatial-temporal context in video,we propose a proposal-level feature aggregation framework,which learns to enhance proposal's feature by modeling the dependency among proposals from intra-and inter-frames.With due consideration of visual feature,spatial position and temporal position,it makes full use of spatialtemporal context.The proposed method has the following merits: it does not need any hand-crafted design,e.g.the feature wrapping process and is fully trainable.It circumvents the challenging problem of accurate feature correspondence estimation,which makes it robust to low quality image frames.It can capture the temporal consistency particular to video.Finally,we verify the validity of it on Image Net VID dataset.The proposed method improves the single frame Faster R-CNN baseline by about 6% and outperforms the previous state of the art by 1.4% m AP under the setting of no temporal post-processing.2.Based on convolutional neural network,we propose a light-weight Dor T(Detect or Track)framework for video object detection.The proposed Dor T framework formulates video object detection as a sequential decision problem and achieves good trade-off between speed and accuracy via the combination of image object detection,singleobject tracking and an accurate learnable scheduler.It's in real-time(over 30FPS),with low latency and capable of associating an object,which highly meets the demands of scenarios like autonomous driving.Eventually,we validate the effectiveness of the proposed method in the large-scale video dataset Image Net VID.

Keywords/Search Tags:

Object detection, Convolutional neural network, Feature aggregation, End-to-end

PDF Full Text Request

Related items

1	Research On Single Color Image Object Detection Method Based On Convolutional Neural Network
2	Research On The Algorithms Of Semantic Segmentation And Object Detection Based On Contextual Aggregation
3	Research On Object Detection Method Based On Convolutional Neural Network
4	Research On Object Detection Based On Deep Convolutional Neural Network
5	Research On Single Target Visual Tracking Algorithm Based On Convolutional Neural Network
6	Object Detection With Region-based Convolutional Neural Network
7	Object Detection Based On Convolutional Neural Network
8	Research On Application Of Convolutional Neural Network In Object Detection Algorithm
9	Research On One-stage Object Detection Algorithm Based On Convolutional Neural Network
10	Research On General Object Detection Method Based On Deep Learning