Font Size: a A A

Deeply Supervised Approaches For Salient Object Detection In Complex Scenarios

Posted on:2022-05-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:Inam UllahFull Text:PDF
GTID:1488306608977289Subject:Physics
Abstract/Summary:PDF Full Text Request
Salient object detection(SOD)aims to detect and segment the most distinctive regions of an image.It plays an essential role in many computer vision tasks,such as visual tracking,object recognition,image editing,image retrieval,video analysis,and robot navigation.It also works as a pre-processing task to limit the search space for other processes,such as feature extraction,feature matching,and compression.Heuristic-based methods mainly solve the problem by leveraging hand-crafted features(e.g.,color and contrast)to capture local details and generate saliency maps accordingly.However,these methods cannot capture high-level semantic information.Thus,they can hardly locate and segment salient integral objects in challenging scenarios.Recently,convolutional neural networks(CNNs)have significantly advanced the development of salient object detection methods to extract low-level details and high-level semantic information simultaneously.Due to the pyramid-like CNNs architecture and the subsampling operations(e.g.,pooling and convolution),low-level features contain more low-level details but are full of background noises.On the other hand,high-level features have smaller spatial sizes but preserve more semantic knowledge,supporting salient objects detection and suppressing the background noises.Thus,it is essential to aggregate these features in sophisticated manner to generate accurate saliency maps.Based on these observations,various saliency detection methods have been proposed.Despite the decent performance of these methods,there are still several problems remaining unsolved.First,CNN-based approaches have trouble in segmenting salient objects of different scales.Several techniques solve the problem by integrating multi-level features layer-by-layer.However,this simple fusion strategy cannot fully exploit multi-scale context-aware features to enhance the single-layer representation.Second,the high-level semantics are progressively transmitted to the shallower layers due to the pyramid fusion structure.Hence,the semantically salient cues captured by deeper layers may be gradually diluted throughout the progressive fusion.As a result,the predicted results tend to have incomplete object structures or over-predicted foreground regions.Third,different methods apply intra-layer multi-scale information using the atrous spatial pyramid pooling module(ASPP)or the pyramid pooling module(PPM)but pay little attention to the differences in receptive field size,which produce gridding issues.Thus,these methods may result in performance degradation.Fourth,most of the previous works use the standard binary cross-entropy(BCE)loss function.However,it assigns equal weights to all pixels,which results in saliency maps with blurred boundaries.For this purpose,some works adopted recursive and backpropagation methods with deep supervisions to refine boundary information.However,the salient object boundaries are not explicitly modeled.Besides,some approaches used CRF as a post-processing stage to preserve the edge,which is time-consuming in the inference phase.To address these challenges,in this thesis,sophisticated approaches are followed to learn the rich contextual information by adopting efficient multi-scale and multi-level fusion schemes.The main focus of this thesis is resolving scalespecific issues,resolving feature dilution during top-down propagation,applying large receptive fields without feature degradations,and generating salient objects with more accurate boundaries.The main works and contributions of the thesis are as follows:1.To address the issue that the simple layer-by-layer integration cannot segment the different scale objects.A multi-scale approach is proposed with multiside-output supervision that refined the low-level and high-level features by appending various trivial and atrous convolutions.The proposed method used skipand short-connections in the encoder-decoder to guide the lower-level features progressively.The embedded recurrent convolutional layer and multi-supervision with weighted binary cross-entropy function enhance salient object detection capability and segment the salient object with more precise boundaries.2.To address the contextual features dilution effect during top-down propagation,a Global Context-aware Multi-scale Feature Aggregative(GCMANet)method is proposed.The proposed method learns the adjacent and cross-wise adjacent features to cope with different scale objects.To avoid the gridding issues or sparse connections and extract rich contextual features with large receptive fields,a k×1 and 1×k with large kernels are utilized to provide more dense connections with fewer connections parameters than ASPP.Furthermore,the Global context flow module and Self-interactive learning modules are used to avoid the global features dilution and guide the model to focus on salient regions and discard non-salient regions.The proposed GCMANet approach is compared with 19 state-of-the-art approaches on five public datasets.Experimental results show that the proposed method achieves the best performance in terms of five metrics.3.To address the issues that different scale objects make the segmentation difficult and the binary cross-entropy generate blurred boundaries because it gives equal importance to all pixels.An Edge-aware Stacked Attention Refinement Network(ESARNet)is proposed by adopting a novel Cross Feature Integration(CFIM)and Edge-aware Guidance Modules(EGM).The CFIM extracts subregions of the salient object at a different scale,and the EGM is the joint edge learning scheme to maintain sharper boundaries.Moreover,the proposed multiscale Feature Extraction Module extracts more semantic enrich features,and the short connections from top-to-bottom eliminate the dilution effect of contextual features.Extensive experiments on five challenging benchmark datasets show the supremacy of the proposed model over state-of-the-art methods under different evaluation metrics.
Keywords/Search Tags:Visual Saliency Detection, Salient Object Detection(SOD), Multi-Level-Integration, Supervised Learning, Deep Learning
PDF Full Text Request
Related items