Font Size: a A A

Researches On RGB-D Visual Salient Object Detection Algorithms Based On Feature Fusion

Posted on:2022-11-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:J J WuFull Text:PDF
GTID:1488306764498844Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Visual salient object detection is an important research direction in the field of computer vision.Its core task is to quickly locate the most valuable areas from complex visual scenes and selectively filter out other non-important information by simulating the human visual attention mechanism.This characteristic of the technology makes it have a very wide range of application value in real life,such as autonomous driving,human-computer interaction,intelligent monitoring,object tracking,environmental perception,etc.In recent years,with the rapid development and popularization of depth sensor devices,people can easily obtain the RGB information and corresponding depth data in the scene.Taking this as an opportunity,RGB-D visual salient object detection,which is more consistent with the human visual perception system,has gradually become the focus of research.However,due to the influence of depth imaging technology and environmental illumination changes,the quality of the acquired depth maps is often uneven.This makes it a major research difficulty in this field how to reduce the interference of low-quality depth maps and effectively utilize the depth information with rich spatial structure.In addition,benefiting from the breakthrough achievements of deep learning in the computer vision field,RGB-D saliency detection by deep learning has gradually become the mainstream direction in this field.Although deep learning has made great progress in this field,it also faces another major challenge,that is,most existing saliency detection methods tend to obtain high-precision detection performance at the expense of high computational cost and parameter amount.It seriously restricts the practical application of the algorithm.Therefore,improving the fusion quality of multi-modal features and constructing a lightweight saliency detection model has important research significance.The main goal of this paper is to explore the RGB-D saliency detection model that is more in line with the human binocular visual perception mechanism.A series of research work is carried out on the effective extraction and fusion optimization of multi-modal features in the saliency model construction process,as well as the model lightweight.The main research contents and innovative points are summarized as follows:(1)Aiming at insufficient multi-modal feature and boundary contact problems in traditional hand-designed feature algorithms,a multi-stage saliency detection model with the bilateral absorbing Markov chain guided by depth information is proposed.The model gradually integrates and optimizes color and depth information from low,medium and high levels,and extracts saliency cues,and makes full use of the explicit and implicit properties of depth information.Specifically,the model first explicitly combines color and depth information to construct an initial two-layer sparse graph model,and generates low-level saliency cues based on a background prior and a region contrast prior.Then,A bilateral absorbing Markov chain model is built to calculate the mid-level saliency maps.In mid-level,in order to effectively solve the problems of boundary contact and multi-modal fusion,a background seed screening mechanism and a cross-modal multi-graph learning model are designed,respectively.They improve the initial graph model by using low-level saliency cues from the graph connection mode and the graph affinity matrix.At the same time,the non-local connection is introduced into the graph model to enhance the consistency of saliency regions.Finally,the mid-level saliency results are further optimized by building a depth-guided optimization module to obtain the final saliency map.A suite of quantitative and qualitative comparative data analysis exhibits the effectiveness and robustness of the proposed model.(2)Aiming at the difficulty of describing salient objects in complex scenes,an RGB-D saliency detection model based on progressive guided fusion network is proposed.The model mainly includes four types of sub-modules,which are continuously alternately cascaded in a top-down manner to enhance and optimize the fusion of multi-modal features,and gradually mine and integrate valuable information.Specifically,the model first utilizes convolution networks to extract color and depth modal features at multiple layers,respectively.Then,by constructing a multi-modal multi-scale attention fusion module at each layer,the feature complementarity in different modalities and at different scales can be fully exploited to achieve optimal feature fusion.Secondly,in order to enhance the semantic expression ability of shallow features,a multi-modal feature refinement mechanism is constructed,and high-level fusion features are used to guide the enhancement of shallow original RGB features and deep features.Finally,a residual prediction module is designed to further suppress background elements and predict the final saliency result.The results of qualitative and quantitative data analysis on relevant datasets fully prove the effectiveness and robustness of this algorithm.While achieving advanced performance,the algorithm can also better cope with the RGB-T saliency detection task based on RGB and thermal infrared data,showing excellent transferability.(3)Aiming at practical application problems such as complex network model,large number of parameters,and occupation of computing resources,a lightweight cross-modal awareness network is constructed for RGB-D saliency detection.In order to improve the feature representation learning ability of the lightweight backbone network,a mutual attention enhancement module is designed and embedded in the depth feature encoding stream.It strengthens the representation of RGB and depth features by exploiting deep semantic features and correlations between modalities.At the same time,a selective intermodulation fusion module and a high-level guided feature refinement mechanism are proposed to ensure the efficiency and accuracy of the algorithm.A large number of qualitative and quantitative data comparison results demonstrate the effectiveness of the proposed modules and the superiority,real-time and robustness of the overall algorithm.
Keywords/Search Tags:RGB-D salient object detection, Cross-modal multi-graph Learning model, Absorbing Markov chain, Multi-modal multi-scale Attention, Network lightweight
PDF Full Text Request
Related items