Font Size: a A A

Multi-modal Image Salient Object Detection Based On Domain Adaptation

Posted on:2022-04-23Degree:MasterType:Thesis
Country:ChinaCandidate:H B WuFull Text:PDF
GTID:2518306602465884Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
Salient object detection aims to detect the region of interest in an image.As a basic task for computer vision,salient object detection has been used in many computer vision tasks,including image understanding,semantic segmentation,person re-identification,content-based image compression and so on.For various challenging scenes,such as insufficient illumination and complex backgrounds,the performance of salient object detection can be greatly improved by virtue of multi-modal images.In recent years,with the rapid development of deep learning in computer vision,multi-modal salient object detection algorithms trained with high-quality labels have achieved great performance.However,these algorithms mainly rely on manually labeled multi-modal datasets.As the labeling process is tedious and labor intensive,the scale of multi-modal datasets is still limited.In order to solve the above problems,we will explore the use of existing large-scale labeled single-modal dataset to achieve salient object detection on unlabeled multi-modal datasets.For that,we present two multi-modal salient object detection algorithms based on domain adaptation,which can transfer the knowledge learned on RGB datasets to the same salient object detection task on multi-modality RGB-D datasets.The two proposed salient object detection algorithms are verified on several public multi-modal datasets.In conclusion,the main works of this thesis are as follows:(1)To solve the problem of modality inconsistency between single-modal data and multi-modal data,we present a multi-modal image salient object detection method based on image-to-image translation and multi-level feature fusion(ITMFF).First,we generate depth images corresponding to RGB images through an image-to-image translation method,so that the single-modal data and the multi-modal data are consistent in the number of modalities.Then we design a salient object detection network based on multi-level feature fusion.We employ the residual cross-modal fusion module to implement feature fusion in the network.The residual cross-modal fusion module integrates the complementary information.Fusion features promote domain adaptation from single-modal data to multi-modal data.Finally,this method reduces the domain shift between single-modal data and multi-modal data.(2)Aiming at the problem of domain shift between single-modal data and multi-modal data,we present a two-stage multi-modal image salient object detection method based on label generation and optimization(LGO).The algorithm decomposes the domain adaptation problem into a pseudo-label generation based on single-modal domain adaptation and a multi-modal salient object detection based on label optimization.The algorithm can predict accurate results of multi-modal images.In the pseudo-label generation stage,we present a pseudo-label generation network based on multi-level adversarial learning.The RGB images in the large-scale labeled single-modal dataset are used as the source domain and the RGB images in the unlabeled multi-modal dataset are used as the target domain.The prediction of the target domain is obtained through the domain adaptation of multi-level adversarial learning,and the prediction is used as the pseudo-label of the next stage.In the label optimization stage,we construct a multi-modal salient object detection network based on pseudo-label optimization.The network generates multi-modal prediction under the supervision of pseudo-labels,and iteratively optimizes pseudo-labels.The adaptive label optimization mechanism can adaptively update the pseudo-labels for different image samples.Finally,some experimental results show the effectiveness of two proposed algorithms on public datasets.The proposed ITMFF outperforms other supervised methods in terms of accuracy and visual quality in the multi-modal scene without annotations.In addition,experiments on three public datasets demonstrate that our LGO can effectively predict salient objects in complex scenes.
Keywords/Search Tags:Multi-modal image salient object detection, Domain adaptation, Cross-modal feature fusion, Pseudo-label generation and and optimization
PDF Full Text Request
Related items