Font Size: a A A

Research On Visual Saliency In 3D Natural Scenes

Posted on:2021-04-12Degree:MasterType:Thesis
Country:ChinaCandidate:B LiFull Text:PDF
GTID:2518306104486454Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Recent years have witnessed a new paradigm shift from 2D to 3D in computer vision applications.3D visual saliency detection,which plays a fundamental role in these applications,attracts a lot of research interest.Compared to traditional 2D saliency models,3D saliency models explore different modality saliency cues,including 2D cues,depth cues and motion cues,which help to improve the performance of saliency models.The multimodality saliency cues complement and compete with each other,and how to effectively fuse these cues remains a challenge.In this paper,we investigate the fusion of multi-modality saliency cues based on RGB-D video,which is the most commonly used 3D visual data.Then,to further improve the performance of saliency detection models,we choose the light filed data as input.Compared to RGB-D data,the light field data provides more saliency cues,such as the focusness cues.How to adaptively make use of the focal slices with different qualities is a challenging problem in light field saliency models,and we will solve this problem in this paper.The main contributions of this paper are summarized as follows:(1)We propose a saliency detection model based on RGB-D video.First of all,multimodality saliency maps are computed for each frame,including 2D,depth and motion saliency maps.To avoid the influence of noises on depth map,we apply clustering in each superpixel to calculate its representative value.(2)To address the complement and competition among multi-modality saliency cues,we propose a saliency fusion algorithm based on the superpixel-level conditional random field model.A global energy function is designed to jointly consider the involved multimodality saliency cues and smoothness constraint between neighboring superpixels.Besides,the weighting factors of the multi-modality saliency maps are learned via a weight-learning network.The experimental results show that the proposed model achieves the best performance on two RGB-D saliency datasets,and the qualitative evaluation results show that the proposed model accurately locates the salient region and eliminates the background noise caused by the competition of multi-modality cues.(3)We propose a novel attentive multi-level recurrent network(AMR-Net)for saliency detection on light field.The AMR-Net consists of a feature extraction network and a hierarchy of attentive focal slice weighting(AFSW)modules.The novel AFSW module can adaptively assign importance to different focal slices by adding weights on their feature,and effectively integrate spatial,depth and focusness saliency cues.The experimental results show that the proposed model achieves the best performance on two light field saliency datasets.It is verified that the proposed model can adaptively suppress the interference of inaccurate focusing slices,eliminate the background noise,and effectively separate the salient object from the complex background.
Keywords/Search Tags:RGB-D saliency, conditional random field, global energy function, light field saliency, attention mechanism
PDF Full Text Request
Related items