Multi-modal Image Salient Object Detection Based On Domain Adaptation

Posted on:2022-04-23

Degree:Master

Type:Thesis

Country:China

Candidate:H B Wu

Full Text:PDF

GTID:2518306602465884

Subject:Control theory and control engineering

Abstract/Summary:

PDF Full Text Request

Salient object detection aims to detect the region of interest in an image.As a basic task for computer vision,salient object detection has been used in many computer vision tasks,including image understanding,semantic segmentation,person re-identification,content-based image compression and so on.For various challenging scenes,such as insufficient illumination and complex backgrounds,the performance of salient object detection can be greatly improved by virtue of multi-modal images.In recent years,with the rapid development of deep learning in computer vision,multi-modal salient object detection algorithms trained with high-quality labels have achieved great performance.However,these algorithms mainly rely on manually labeled multi-modal datasets.As the labeling process is tedious and labor intensive,the scale of multi-modal datasets is still limited.In order to solve the above problems,we will explore the use of existing large-scale labeled single-modal dataset to achieve salient object detection on unlabeled multi-modal datasets.For that,we present two multi-modal salient object detection algorithms based on domain adaptation,which can transfer the knowledge learned on RGB datasets to the same salient object detection task on multi-modality RGB-D datasets.The two proposed salient object detection algorithms are verified on several public multi-modal datasets.In conclusion,the main works of this thesis are as follows:(1)To solve the problem of modality inconsistency between single-modal data and multi-modal data,we present a multi-modal image salient object detection method based on image-to-image translation and multi-level feature fusion(ITMFF).First,we generate depth images corresponding to RGB images through an image-to-image translation method,so that the single-modal data and the multi-modal data are consistent in the number of modalities.Then we design a salient object detection network based on multi-level feature fusion.We employ the residual cross-modal fusion module to implement feature fusion in the network.The residual cross-modal fusion module integrates the complementary information.Fusion features promote domain adaptation from single-modal data to multi-modal data.Finally,this method reduces the domain shift between single-modal data and multi-modal data.(2)Aiming at the problem of domain shift between single-modal data and multi-modal data,we present a two-stage multi-modal image salient object detection method based on label generation and optimization(LGO).The algorithm decomposes the domain adaptation problem into a pseudo-label generation based on single-modal domain adaptation and a multi-modal salient object detection based on label optimization.The algorithm can predict accurate results of multi-modal images.In the pseudo-label generation stage,we present a pseudo-label generation network based on multi-level adversarial learning.The RGB images in the large-scale labeled single-modal dataset are used as the source domain and the RGB images in the unlabeled multi-modal dataset are used as the target domain.The prediction of the target domain is obtained through the domain adaptation of multi-level adversarial learning,and the prediction is used as the pseudo-label of the next stage.In the label optimization stage,we construct a multi-modal salient object detection network based on pseudo-label optimization.The network generates multi-modal prediction under the supervision of pseudo-labels,and iteratively optimizes pseudo-labels.The adaptive label optimization mechanism can adaptively update the pseudo-labels for different image samples.Finally,some experimental results show the effectiveness of two proposed algorithms on public datasets.The proposed ITMFF outperforms other supervised methods in terms of accuracy and visual quality in the multi-modal scene without annotations.In addition,experiments on three public datasets demonstrate that our LGO can effectively predict salient objects in complex scenes.

Keywords/Search Tags:

Multi-modal image salient object detection, Domain adaptation, Cross-modal feature fusion, Pseudo-label generation and and optimization

PDF Full Text Request

Related items

1	Researches On RGB-D Visual Salient Object Detection Algorithms Based On Feature Fusion
2	Research On RGB-D Salient Object Detection Based On Cross-modal Fusion
3	Research On Salient Object Detection Algorithm Of Multi-source Images
4	A Research On Salient Object Detection Based On Multi-Modal Fusion
5	Research On RGB-D Salient Object Detection Guided By Cross-modal Interaction
6	Research On Key Technologies Of Intelligence Target Analysis Based On Swarm Multi-modal Perception
7	Salient Object Detection For RGB-D Images Based On Multi-modal Fusion
8	Research And Application Of Salient Object Detection Based On Deep Learnin
9	Research On Visual Perception Technology Based On Multi-modal Fusion
10	High-fidelity Colorization Of Image Generation