| With the gradual increase of urbanization rate in China,the building footprint is growing rapidly.Using high-resolution remote sensing images to accurately extract the buildings in the images can provide auxiliary information for urban planning,disaster prevention,smart city construction and other applications.The rich geometric,textural and spectral information in high-resolution remote sensing images provides the basis for extracting buildings with high accuracy,however,the phenomenon of different objects which have the same spectrum and the same objects have different spectrum exists in high-resolution remote sensing images,which limits the improvement of accuracy.Traditional methods use human-designed features for building extraction,which require a lot of time and effort in order to design a representation layer that matches the input data.Although these methods have achieved some results,they are not effective in complex situations.Deep learning does not need to design features,it can automatically learn all features and uses end-to-end model to replace the complex data processing process,which makes deep learning an important theoretical basis for many building extraction methods.This paper studies and analyzes the fully convolutional networks based on deep learning,which achieve great improvement in accuracy compared with traditional methods.But it also has many problems.In order to solve some of these problems,this paper carries out the following research.(1)The current fully convolutional models applied to high-resolution remote sensing image building extraction commonly use batch normalization and Re LU activation function to process the convolution results.The Re LU activation function tends to kill the neurons,while using batch normalization will significantly increase the training time and inference time of the model.In order to solve this problem,the scheme of using the activation function named SELU to replace the Re LU activation function while eliminating the batch normalization is proposed in this paper.In order to verify the feasibility of the scheme,a comparative experiment is conducted on the WHU building data set,and a test network is designed to extract buildings by using the above two combinations respectively.The extraction results of both are analyzed qualitatively and quantitatively,and the results show that the scheme of using SELU to replace Re LU and eliminating batch normalization can significantly improve the building extraction results.(2)The accuracy of building extraction is positively correlated with the number of parameters and complexity of the fully convolutional networks.However,training models with large number of parameters and high complexity is particularly demanding on computer hardware,especially the memory of graphics cards.Because of it,It is difficult for many computers to train such models,which limits the improvement of accuracy.To solve this problem,this paper proposes a lightweight fully convolutional networks method for building extraction based on ensemble learning,which transforms training a model with large parameters and high complexity into training multiple lightweight fully convolutional networks with different structures,and combines the extraction results of multiple lightweight models for building extraction.To obtain strong but different lightweight fully convolutional networks,this paper uses the SELU activation function to construct the base convolutional module named conv_block.Three lightweight fully convolutional networks based on conv_block are proposed.The first one is named SC_UNet,which is based on the Cross layer attention and pyramid pooling.The second is named HS_PSPNet,which is based on the hierarchical-split block and sc SE attention mechanism.The third is named HR_PSPNet,which is based on parallel multiscale input and pyramidal pooling.Finally,different ensemble learning methods are implemented for the results of three models.The experimental results show that giving the same weights to the outputs of the three networks and summing them results in the best extraction results of the ensemble learning.Compared with the extraction results of the strongest of the three networks,the extraction results of ensemble learning are significantly improved.(3)The process of extracting buildings from remote sensing images by fully convolutional networks is to first cut the images into many small blocks and then judge them separately,which leads to serious misjudgment due to the incomplete information at the edges of small blocks.To solve this problem,In this paper,the existing method named Overlapsize is used to improve the extraction results.The experimental results on WHU dataset show that the method can significantly reduce the boundary misjudgment phenomenon,which proves that the method is applicable to high-resolution remote sensing images.(4)The extraction results of this paper’s method and other methods on the WHU dataset are quantitatively analyzed,and the other models involved in the comparison include S_UNet,SRI_Net,AU_Net,SA_Net,USPP,and SR_FCN.The experimental results show that the method in this paper can complete the building extraction better,and achieves 0.954 accuracy,0.944 recall,0.949 F1-score,0.903 IOU on the test set,and all the above indexes are in the first place of the corresponding indexes of the participating compared models,which verifies the effectiveness of the method. |