With the advancement of earth observation technology and the development of remote sensing imaging technology,a large number of high-resolution remote sensing images can be obtained at this stage,which makes the problem of how to make full use of remote sensing images for intelligent earth observation becomes particularly urgent.Semantic segmentation has many applications in the field of remote sensing,such as land use,building extraction,road extraction,and vehicle detection.The task of semantic segmentation of remote sensing images can identify and extract land cover and use types based on highresolution remote sensing image data of different topography and landforms.In view of some of the challenges in the current remote sensing image semantic segmentation tasks,this paper has done the following research work.In the process of applying the DeepLabv3+ algorithm to such special scenes,it is found that there are two problems with the hole space pyramid pooling in the Deeplabv3+algorithm: the original void rate interval of the hole convolution is large,which is not conducive to the extraction of low resolution containing high-level semantic information.The large number of parameters of the rate feature map and standard convolution leads to a decrease in the training efficiency of the network model.Therefore,this paper proposes the DeepLabv3+_Array ASPP algorithm to improve the DeepLabv3+ algorithm.The DeepLabv3+_Array ASPP algorithm mainly adds the Array-ASPP module to replace the ASPP module.The main improvement point of the Array-ASPP module is to propose a pyramid void rate with multiples of 2.Array and replace standard convolution with depth separable convolution.In order to distinguish the confusing categories and consider objects with different appearances,this paper proposes a DeepLabv3+_TDA&PDA algorithm with multi-scale adaptive feature enhancement,which mainly introduces two attention-related modules TDA and PDA on the basis of the DeepLabv3+_Array ASPP algorithm.By cascading the two modules of TDA and PDA,it adaptively integrates the dependence of local features and global features and aggregates long-term context information,thereby improving the feature representation of remote sensing image semantic segmentation.In addition,this paper uses the feature pyramid for feature fusion of low-level features with high resolution to increase the network’s fusion of the underlying features,and at the same time appropriately reduces the amplitude of the upsampling operation before feature fusion,so that the final prediction result is more accurate.Because convolution operation can’t model the global context well,this paper proposes DeepLabv3+_Transformer algorithm,mainly on the basis of the original improved network,introduces a convolutional neural network and visual transformer encoder structure,that is,self attention mechanism is introduced into the encoder design.Because the transformer module will lead to the loss of low resolution features,we introduce another decoder,that is,the feature map output from the transformer module and the feature map of each stage in the backbone network are separately fused and upsampled in a U-shaped structure,and the high-resolution information of the remote sensing image is further combined with the dual decoder.In order to verify the effectiveness and generalization of the algorithm,this paper compares the proposed algorithm with other algorithms on the BDCI2017 and UAVid data sets.The experimental results show that the algorithm proposed in this paper has significant advantages in the field of remote sensing image semantic segmentation compared with the current more advanced algorithms. |