Font Size: a A A

Salient Object Detection For RGB-D Images Based On Multi-modal Fusion

Posted on:2021-05-29Degree:MasterType:Thesis
Country:ChinaCandidate:Q T DuanFull Text:PDF
GTID:2428330629480278Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Salient object detection aims to find out the most attractive objects,as a branch of computer vision task,more and more attention in recent years.As a preprocessing step in other computer vision tasks,for instance,image segmentation,and scene reconstruction,visual tracking,etc..,can save a lot of unnecessary time and space costs.At present,the salient object detection for single image is mainly divided into the salient object detection for RGB images and RGB-D images.With the application of Depth sensor,Depth information has been proved to be a particularly valuable feature cue in the salient object detection.Therefore,a lot of RGB-D salient object detection methods have been proposed.The early works were all RGB-D salient object detection based on artificial design features,which simply fused RGB features and Depth features,missing the globality.Especially in the case of complex background,the traditional method failed to achieve good results.In recent years,with the development of deep learning,various RGB-D models based on convolutional neural network have been applied to image salient object detection.At present,the RGB-D image salient object detection models based on deep convolutional neural network is mainly divided into two types: one is input fusion,that is,single-stream network architecture model;the other is late-stage fusion,that is,double-stream network architecture model.Aiming at the RGB-D image salient object detection,this paper has done some work based on these two architecture models.Aiming at the research of deep convolutional neural network,the first work of this paper designed an RGB-D salient object detection model based on single-stream network.In this work RGBD four-channels input is chosen,and meanwhile progressive parallel spatial and channel attention mechanisms are performed to improve feature representation.Spatial and channel attention mechanisms can pay more attention on partial positions and channels in the image which show higher response to salient objects.Both attentive features are optimized by attentive feature from higher layer respectively,and parallel fed into recurrent convolutional layer to generate side-output saliency maps guided by saliency map from higher layer.Last multilevel saliency maps are fused together from multi-scale perspective.Experiments on benchmark datasets demonstrate that parallel attention mechanism and progressive optimization operation play an important role in improving the accuracy of salient object detection,and our model outperforms state-of-the-art models in evaluation matrices.Existing RGB-D image salient object detection based on double-stream network architecture,which treat RGB and Depth data equally in multi-modal case,are almost identical in feature extraction.As the lower layers Depth features with a lot noise,it causes image features not be well characterized.Therefore,a multi-modal feature-fused supervision of RGB-D salient object detection network is proposed,through double-stream study RGB and Depth data independently,double-side supervision module respectively to obtain saliency maps of each layer,and then the multi-modal feature-fused module is used to fuse the later three layers RGB and Depth higher-dimensional information of VGG16 Net to generate higher saliency predicted results.Network from first to fifth layer gradually generate the RGB and the Depth of each modal features,then from fifth to third layer,using the way of high-level guides low-level to generate multi-modal fused feature,from second to first layer,then use the fusion feature generated by the third layer to gradually optimize the RGB feature of the first two layers,finally,output a saliency map contains both RGB low-level information and RGB-D high-level multi-modal information.Experiments on three open data sets show that the proposed network has better performance than the current RGB-D saliency detection models and stronger robustness,because the double-stream side-supervised module and multi-modal feature-fused module are used.This paper proposes two different models and solutions for RGB-D salient object detection,and obtains good results,which lays a foundation for computer vision tasks.
Keywords/Search Tags:Salient Object Detection, Convolutional Neural Network, Multi-modal, Attention Mechanism, Single-stream Network
PDF Full Text Request
Related items