Font Size: a A A

Research On Salient Object Detection Via Scene Geometric Information

Posted on:2022-09-19Degree:MasterType:Thesis
Country:ChinaCandidate:Z K RongFull Text:PDF
GTID:2518306509977379Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Salient object detection is essential for progress in image understanding and has shown great potential in various computer vision and image processing tasks.The existing salient object detection methods can be roughly divided into three categories based on the RGB,RGB-D and light field input images.Different from RGB data,the RGB-D and light field data provides accurate geometric information of the scene through depth map,multi-view images and focal stack.Such abundant geometric information produces efficient saliency features for salient object detection in challenging scenes.However,salient object detection as a pre-processing step for many tasks should be efficient and versatile.To this end,the following three practical problems need to be overcome: the deficiency of geometric information in terms of scale,category and element types limits the generalization of deep models;geometric information is inconvenient to obtain in practical applications;high consumption of computing resources caused by high dimensional geometric information processing.To confront these challenges,we make following contributions:(1)We introduce a large-scale dataset(DUTLF-V2)to enable versatile applications for RGB,RGB-D and light field salient object detection,containing 102 classes and 4204 samples.(2)For the high cost of acquiring multi-view images,we show that salient object detection can be decomposed as two sub-problems: light field multi-view synthesis and multi-view salient object detection.We propose a high-quality light field synthesis network to produce reliable multi-view images.Then we propose a novel multi-view salient object detection network for integrating multi-view saliency predictions effectively.The proposed method outperforms state-of-the-art RGB,RGB-D and light field methods on the multi-view dataset.(3)For the high computing resources and memory consumption of focal stack,we introduce an asymmetrical two-stream architecture.First,we design a teacher network to learn to exploit focal slices for higher requirements on desktop computers and meanwhile transfer comprehensive focusness knowledge to the student network.Second,we propose two distillation schemes to train a student network towards memory and computation efficiency while ensuring the performance.Our teacher network achieves state-of-the-art results on three light field datasets and student network achieves Top-4 accuracies,which minimizes the model size by 56% and boosts the Frame Per Second(FPS)by 159%,compared with the best performing method.(4)For the high risk of collecting depth information by depth sensors,we propose a depth distiller(A2dele)to explore the way of using network prediction and attention as two bridges to transfer the depth knowledge from the depth stream to the RGB stream.In A2 dele,the adaptive depth distillation scheme aims to realize the desired control of pixel-wise depth knowledge transferred to the RGB stream,and the attentive depth distillation scheme focuses on transferring the localization knowledge to RGB features.Our RGB stream achieves state-of-the-art performance on five RGB-D datasets,which tremendously minimizes the model size by 76% and runs 11 times faster,compared with the best performing method.Furthermore,A2 dele can be applied to improve the efficiency of RGB-D methods by a large margin,which maintaining performance.
Keywords/Search Tags:Salient Object Detection, Geometric Information, Light Field, Depth, Knowledge Distillation
PDF Full Text Request
Related items