| Recently,with the development of deep learning techniques,the semantic segmentation of remote sensing images based on supervised deep learning has achieved great success and is increasingly used in global mapping,urban built-up area identification and other applications.However,the success of deep learning techniques relies heavily on a large number of high-quality annotated samples,but due to the high cost of annotation for semantic segmentation tasks and the huge heterogeneity of remote sensing images in time and space,the available annotated data is only an interception of remote sensing images,and we face a serious problem of insufficient annotated samples.A new learning paradigm,selfsupervised learning(SSL),can be used to solve such problems by pretraining a general model with a large number of unlabeled images and then fine-tuning it on a downstream task with very few labeled samples.Since SSL could directly learn the essential characteristics of data from unlabeled data,which is easy to obtain in the remote sensing field,this may be of great significance for tasks such as global mapping.Therefore,this paper focuses on the problems of self-supervised learning applied to the field of remote sensing semantic segmentation.Specifically,the main work of this paper is summarized as follows.(1)In this paper,the task of remote sensing semantic segmentation has been explored from supervised learning to self-supervised learning.The experiments find that the accuracy of semantic segmentation decreases significantly under the existing supervised learning with limited and biased annotated data,and the introduction of self-supervised learning can improve the accuracy of semantic segmentation with limited annotated data,where the contrastive learning shows good performance.(2)Most existing contrastive learning are designed for natural image classification tasks to learn image-level representations,which may be suboptimal for semantic segmentation tasks requiring pixel-level discrimination.Therefore,we propose a global style and local matching contrastive learning network(GLCNet)for remote sensing image semantic segmentation.Specifically,1)the global style contrastive learning module is used to better learn an image-level representation,as we consider that style features can better represent the overall image features.2)The local features matching contrastive learning module is designed to learn representations of local regions,which is beneficial for semantic segmentation.Experimental results show that this method significantly outperforms other self-supervised methods on remote sensing semantic segmentation datasets.(3)To address the problem that existing self-supervised learning paradigms are prone to overfitting when migrating to downstream tasks using only limited annotations for direct fine-tuning,this paper considers introducing unannotated image data for specific downstream semantic segmentation tasks,that is,using semi-supervised learning to improve the performance on specific downstream tasks.Specifically,this paper proposes a class separability-enhanced self-training method,which utilizes supervised contrastive learning to enhance inter-class distinction and reduce intra-class inconsistency,while enabling the model to better retain invariant features learned from self-supervised contrastive learning and to correct for the false negative sample problem in self-supervised contrastive learning;in addition,the pseudo-label-based self-training module of the method can exploit the interaction of information between the teacher model and the student model to iteratively improve model performance. |