Font Size: a A A

Optical Image Scene Perception With Deep Learning Techniques

Posted on:2022-09-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:J M PangFull Text:PDF
GTID:1488306329966749Subject:Optical Engineering
Abstract/Summary:PDF Full Text Request
The optical image is an essential carrier in recording the dynamic real-world and facil-itating the perceptual systems perceiving and understanding their surrounding environments.Optical image scene perception aims to enable intelligent systems to independently understand the contents of images,such as the categories and locations of objects.It is a fundamental com-ponent of various visual intelligent systems and is widely used in numerous applications such as remote sensing and autonomous driving.This thesis conducts a series of research on optical image scene perception with deep learn-ing technology,typically focusing on object perception algorithms.It aims to build an end-to-end network capable of detecting,segmenting,and tracking objects in an image or a video under any imaging conditions or any scenarios,thereby building a unified,efficient,and accurate vi-sual perceptual system.This thesis starts from object detection,one of the most fundamental problems in scene perception,to investigate the small object detection problem in large-scale remote sensing im-ages.Remote sensing images are usually vast and complex,that existing object detectors are not fast and accurate enough for practical use.To build a robust and efficient object detector,this thesis proposes a unified and self-reinforced network called remote sensing region-based convolutional neural network(R2-CNN).It composes backbone Tiny-Net,intermediate global attention block,and the final classifier and detector.The classifier and detector are mutually reinforced with end-to-end training,which can further speed up the process and avoid false alarms.The method can process a GF-1 image in 29.4s on Titian X just with a single thread.According to our knowledge,this method is the first solution that can detect tiny objects on such huge remote sensing images gracefully.Compared to object detection,instance segmentation is a finer-grained pixel-wise instance recognition problem.In recent years,this problem is rapidly advanced with various model ar-chitectures.However,the training process,which is also crucial to the success of detectors,has received relatively less attention.This thesis carefully revisits the standard training practice of detectors and find that the detection performance is often limited by the imbalance during the training process.To mitigate the adverse effects caused thereby,this thesis proposes Libra R-CNN,a simple yet effective framework towards balanced learning for instance recognition It integrates IoU balanced sampling,balanced feature pyramid,and objective re-weighting,re-spectively for reducing the imbalance at sample,feature,and objective level.It is the first work that systematically revisits the imbalanced training problem in object detection.The method is verified on MS COCO,LVIS,and Pascal VOC datasets and obtains superior results without introducing many extra costs during the inference time.The real world is dynamically serialized.Multiple object tracking is the next crucial prob-lem for real-world scene perception.Although it is inherently relevant to object detection,re-cent methods usually treat them as two independent problems.This thesis proposes quasi-dense similarity learning that can be directly combined with most existing detection methods to build Quasi-Dense Tracking(QDTrack).In contrast to previous methods that use sparse ground truth matching as the training objective,QDTrack densely samples hundreds of region proposals on a pair of images for contrastive learning.This method greatly improves the representation abil-ity of the network,which makes the resulting distinctive feature space admits a simple nearest neighbor search at the inference time.The network can thus track objects without turning to dis-placement regression or motion priors.Despite its simplicity,QDTrack outperforms all existing methods on MOT,BDD100K,Waymo,and TAO tracking benchmarks.To sum up,this thesis conducts an in-depth exploration of multi-object detection,segmen-tation,and tracking.It proposes a series of algorithms and greatly improved the perception ability of the machine vision system.These methods have been applied or can be applied to remote sensing,autonomous driving,and other practical applications.The achievements are highly potential to create great socioeconomic values.
Keywords/Search Tags:Deep learning, optical images, scene perception, object detection, instance segmentation, multiple object tracking
PDF Full Text Request
Related items