Font Size: a A A

Research On Image-based Object Detection Based On Deep Learning

Posted on:2020-03-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:W LiuFull Text:PDF
GTID:1368330611993003Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Image-based object detection is a key problem in computer vision,and also a fundamental problem in high-level semantic analysis tasks.It has wide applications like automatic driving and video surveillance.In recent years,with the adventure of Convolutional Neural Networks(CNN),significant progress has been made in object detection.However,detection in complex scenes still encounters a series of challenges,such as tiny object detection,the tradeoff between accuracy and computation,the wide range of objects' scale variation and the cross-modality knowledge transfer.In the view of the CNN's processing for image-based object detection,this paper explores different information utilization ways like multi-stage prediction,asymptotic prediction and single-shot prediction,in pursuit of detection methods with higher accuracy and less computation.The main contributions of this paper are listed as follow:1.In the two-stage detection framework,this paper proposes a multi-stage prediction detector based on the techniques of backward feature enhancement and spatial layout preserving,which improves the detector's self-adapation ability to scale variation.Specifically,in the first stage of the detector,a backward feature enhancement network(BFEN)is proposed to feedback semantic information in higer layers to enhance the discrimination of lower layers,while preserving the higher resolution of lower layers.In this way,BFEN significantly improve the recall of tiny objects.In the second stage of the detector,a spatial layout preserving network(SLPN)is proposed to preserve the spatial layout of region features during evolving.In this way,SLPN is helpful to improve the localization of object proposals.On top of the above two designs,the proposed detector overcomes the drawbacks of traditional two-stage detectors in feature discrimination and presents the superiority on detectiong tiny objects.2.In the single-stage detection framework,this paper proposes an asymptotic prediction detector based on the multi-level feature maps,which significantly improves the detection accuracy with minor time consumption overhead.Specifically,an Asymptotic Localization Fitting(ALF)module is proposed to push the anchor boxes towards the ground truth boxes step by step.ALF not only overcomes the problem of positive-negative definition during training,but also helps detector achieve higher localization performance during test.We conduct numerous experiments to demonstrate the effectiveness of the proposed ALF,and we also try on how to exploit the advantages of ALF and find that the effectiveness of ALF is independent on CNN backbones.Given this,the proposed detector overcomes the drawbacks of traditional single-stage detectors in accuracy while maintains the advantages in speed.3.In the view of abandoning the anchor design,this paper proposes a single-shot prediction detector based on object center localization,overcoming the drawback of anchor-based detectors in detecting objects of large scale variation.The proposed detector does not need to design specific anchors for different datasets,and has been demonstrated competitive in various detection tasks.Specifically,we formulates object detection as a high-level semantic feature detection task where it is required to detect the center points of objects by shared convolution.When aided by the scale prediction,the detector is able to generate bounding boxes for the task of object detection.Numerous experiments have been done to study the pros and cons of the proposed detector,especially the key to the competitive performance.Although structurally simple,the proposed detector performs well on pedestrian detection,face detection,vehicle detection and general object detection of 20 classes on Pascal VOC.Bypassing the anchor design of traditional detectors,the proposed detector adapts well to objects with large scale variation and takes a solid step towards a simple and efficient detector.4.In order to perform object detection on infrared images without training labels,this paper explores the cross-modality knowledge transfer from visible images to infrared images in the view of feature representation and sample data.Based on the comparison and analysis of the detection performance of the above two aspects,we propose an adaptive network transfer learning method,successfully adapting the detector trained on visible images to infrared images.Firstly,from the perspective of “feature representation”,we explore an adaptive network transfer learning strategy,which transfers the knowledge from the trained visible image detector to infrared image detector by approximating the feature maps from infrared images to that from visible images.Secondly,from the perspective of “sample data”,we explore how to use the Generative Adversarial Networks(GAN)to translate the visible images with labels into faked infrared images.Given these generated faked infrared images,we are able to train our detector which will be test on real infrared images.Our experiments show that knowledge transfer on feature level performs better than that on data level,and the performance gained by knowledge transfer on feature level is comparable to the baseline performance by supervised learning.Finally,we give a summary of the above work and highlight several interesting research lines in the future work.
Keywords/Search Tags:Convolutional Neural Networks, Image-based object detection, Pedestrian detection, Face detection, Unsupervised transfer learning
PDF Full Text Request
Related items