Font Size: a A A

Research On Deep Learning Object Detection Based On Spatial And Temporal Context

Posted on:2020-05-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z H FuFull Text:PDF
GTID:1368330572487991Subject:Electronic information technology and instrumentation
Abstract/Summary:PDF Full Text Request
Object detection is a core task in computer vision.It aims at localizing all the ob-jects in an image with tight bounding boxes and simultaneously classifying them into the right categories.Object detection can serve other high-level tasks in computer vi-sion,and is also widely used in the fields of smart city,autonomous driving and medical intelligence.In recent years,the academic community has made great breakthroughs in the accuracy and speed of object detection algorithms by utilizing the powerful se-mantic representation capabilities of deep learning.Deep learning object detection,however,still faces several challenges in different application scenarios.For example,there exists plenty of the small-size false positive predictions in most multi-scale detec-tors;detection methods lack of enough true positive results in the scene where objects with the same category are gathered together;and in real-world surveillance scenarios,it is difficult for the object detection algorithms to suppress false positives and simul-taneously improve true positives.This thesis introduces spatial context and temporal context information to solve the corresponding problems.Specifically,the main work and contributions of this thesis can be summarized as follows:In response to the small-size false positives in multi-scale detectors,we propose a previewer module for object detection.The proposed previewer block previews the ob-jectness probability for the potential offset region of each prior box,using the stronger features with more spatial context information.The experimental analysis shows that independent predictions from different depth feature layers on the same region are ben-eficial for reducing false positives.The results also demonstrate that the objectness score predictions of previewer modules can effectively suppress small-size false posi-tives,and thus improve the overall performance of the proposed object detector.In response to lack of enough true positives in the scene where objects with the same category are gathered together,we propose a region proposal extension and atten-tion method.It focuses on the core area where there is the target instance,which could reduce the localization confusions of the region proposal features,and finally improve the number of true positives of the proposed method.The method achieves the average precision of 74.78%on KITTI pedestrian benchmark with the hard metric level,ranking the first place among all methods up to now.In response to the challenges in real-world surveillance object detection,we pro-pose a foreground gating and background refining network,aiming at suppressing false positives and improving true positives simultaneously.The network is a two-stage method.The first stage supplies high quality region proposals by amplifying feature activation on foreground objects while suppressing background regions with temporal context.Then the second stage refines those proposals by pairwise non-local operations which pay attention to the background image to deal with the misalignment problem.Through the experiment results,the method demonstrates its excellent performance in suppressing false positives and improving true positives.
Keywords/Search Tags:object detection, deep learning, spatial context, temporal context, self-attention module, background subtraction, non-local operation
PDF Full Text Request
Related items