Font Size: a A A

Weakly-supervised Semantic Segmentation With Pseudo Label Supervision

Posted on:2024-06-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:L X RuFull Text:PDF
GTID:1528307292460574Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Semantic Image Segmentation is one of the most fundamental research topics in computer vision.It can be applied to many real-world applications,such as autonomous driving,remote sensing image analysis,and medical image analysis.With the developments of deep learning,deep model-based methods have taken the dominant position in semantic segmentation.However,training a deep semantic segmentation model with good performance usually requires large amounts of data with pixel-level annotations.Acquiring such annotation is costly in time and manpower.To reduce the expensive annotation cost,in recent years,researchers have been devoted to semantic segmentation with cheap image-level labels,ie Weakly-Supervised Semantic Segmentation(WSSS)and made remarkable progress.However,there are still many issues to be addressed in WSSS,which include the low accuracy of pseudo labels,the high model complexity of multi-stage methods,and the inherent drawback of backbones.To tackle these problems,based on visual word learning,semantic affinity learning and contrastive learning,this thesis conducts systematic research on WSSS with image-level labels via pseudo supervision.The main contributions of this thesis are summarized as the following aspects.(1)To improve the accuracy of pseudo labels,this thesis proposes a WSSS framework based on Visual Word Learning and Hybrid Pooling(VWL).The proposed VWL uses an updatable visual word codebook to encode the feature maps to visual word labels,which are used to guide the training process,thus enforcing the integral activation of object regions.To update the codebook,VWL designs two unsupervised strategies,ie,the learning-based and memory-based strategy,which update the codebook via the back-propagated gradients and online reconstruction.Additionally,based on local max-pooling and global average pooling,VWL also proposes to aggregate the local discriminative information and global foreground,thus reducing the impact of background features without loss of object completeness.Finally,after processing with the multi-stage framework,VWL can outperform other methods regarding initial pseudo labels and final segmentation results.(2)The proposed VWL still follows a multi-stage WSSS framework,which has the issue of high model complexity and low time efficiency.To address these problems,this thesis proposes an end-to-end WSSS framework(AFA).AFA uses Vision Transformer(ViT)as the backbone to model global feature interaction and generate high-fidelity initial pseudo labels.Inspired by the property that self-attention blocks in ViT can naturally learn pixel-level semantic affinity,the proposed method proposes to learn high-confidence semantic affinity for label refinement from selfattention blocks in ViT.To further complement the local details of pseudo labels,a pixel-adaptive refinement module is proposed,which is based on low-level pixel information for efficient and adaptive pseudo label refinement.The final pseudo labels are used to supervise the segmentation decoder,thus ensuring end-to-end training efficiency and segmentation accuracy.(3)The proposed AFA uses a ViT backbone,in which the features of deep layers are often over-smoothed.To tackle this problem,this thesis proposes Token Contrast(To Co)for WSSS.Firstly,since intermediate layers in ViT can retain semantic diversity,To Co utilizes the patch token relations from intermediate layers to supervise the final-layer patch tokens,which can address the over-smoothing issue and help generate high-quality pseudo labels.Secondly,to further leverage the virtue of ViT,To Co proposes to contrast the class tokens of the local regions and the global image,which can enforce the local-global consistency of object regions and better foreground-background discrepancy,thus improving the pseudo label quality.Finally,based on To Co,the over-smoothing issue can be addressed and the segmentation results can be remarkably improved.In conclusion,this thesis proposed a series of methods for WSSS,which effectively addressed the issues in the pseudo labels,training framework and backbones,and significantly improved the semantic segmentation performance and training efficiency.
Keywords/Search Tags:Weakly-Supervised Semantic Segmentation, Pseudo Labels, Class Activation Maps, Semantic Segmentation, Computer Vision
PDF Full Text Request
Related items