| Remote sensing image semantic segmentation is a research hotspot in the field of remote sensing technology,aiming to achieve accurate recognition of ground object information.Researchers have been exploring new methods and technologies to improve the accuracy and efficiency of remote sensing image segmentation.Currently,convolutional neural network models are widely used in remote sensing image segmentation,but they have limitations in utilizing contextual features of remote sensing images,which limits further improvement in segmentation accuracy.In 2020,the Transformer model achieved excellent results in the field of computer vision with its excellent long-distance feature acquisition ability.Subsequently,researchers introduced the model into the field of remote sensing image segmentation.However,the research on the performance of Transformer models in remote sensing image segmentation is still in its infancy,and there is relatively little research on the segmentation performance of Transformer models in remote sensing image segmentation.Therefore,further exploration is needed to answer the following questions about the segmentation performance of Transformer models in remote sensing image segmentation: Are Transformer models suitable for research on remote sensing image segmentation? Which Transformer model is more suitable for remote sensing image segmentation,especially for high spatial resolution multispectral remote sensing images,and how do different Transformer models perform in segmentation? What are the advantages or disadvantages of Transformer models compared to convolutional neural network models in remote sensing image segmentation?To explore these questions,this study selected three different feature extraction methods for Transformer models(SETRnet,Swin Unet,and Trans Unet)and used the Vaihingen and Potsdam datasets as experimental data to comprehensively compare and analyze Transformer models from three aspects:segmentation results,segmentation accuracy,and model segmentation efficiency.To further analyze the segmentation performance of Transformer models in remote sensing image semantic segmentation research,this paper also included three convolutional neural network models,Deeplab V3+,Unet,and MAnet,as a comparative experimental group.The experimental results show that:(1)Transformer models are suitable for research on remote sensing image segmentation,but different Transformer models have significant differences in segmentation performance in remote sensing datasets of different scales.(2)In the small-scale Vaihingen dataset,Trans Unet had the highest Kappa,MIo U,and OA among all Transformer models,which were 80.54%,56.25%,and 85.55% respectively.This indicates that Trans Unet performs better in small-scale remote sensing datasets and can better handle edge and detail features of objects.In the large-scale Potsdam dataset,Swin Unet had the highest Kappa,MIo U,and OA among all Transformer models,which were 76.47%,63.62%,and 85.01% respectively.Swin Unet showed a better global semantic interaction and pixel-level segmentation prediction ability on large-scale datasets,and had the best segmentation performance.Compared with Trans Unet and Swin Unet,SETRnet is not suitable for remote sensing image segmentation research.(3)Compared with convolutional neural network models,Transformer models have stronger feature extraction capabilities for large-scale remote sensing datasets and object features.However,Transformer models are not as good as convolutional neural network models in edge feature extraction,and need to further improve their spatial feature learning ability.In addition,the training time required for Transformer models is longer than that of convolutional neural network models,which means that the actual application cost of Transformer models is higher.Finally,based on Transformer models,this paper developed an online prototype platform for remote sensing image segmentation,which provides researchers with a convenient and fast remote sensing image segmentation operation.This experimental study aims to further promote the application of Transformer models in the field of remote sensing image segmentation and the research progress in this field. |