The space-based section of the China High-resolution Earth Observation System has been basically completed,and China’s earth observation enters a high-resolution era.More and more high-resolution remote sensing images will be captured and widely used in urban construction planning,natural disaster monitoring,land and resources census,and military security.How to efficiently and accurately classify remote sensing images is the key to further apply remote sensing images to production and life.Semantic segmentation in computer vision perfectly matches the remote sensing image classification task,which can provide a new approach to this problem.With the rapid development of deep learning theory,the semantic segmentation method based on deep learning has become a frontier research in computer vision,and it is an important technical way to efficiently utilize remote sensing images.Although deep learning-based methods have made significant progress in semantic segmentation of natural images,there are still many difficulties in dealing with high-resolution remote sensing images with complex ground object distribution.This dissertation focuses on the problems of large difference in scale,class imbalance,complex distribution and blurred contours of ground objects in high-resolution remote sensing images.A series of solutions are proposed for the shortcomings of current deep learning methods in the effective representation of multi-scale features,mining of hard samples and regions,and precise edge extraction.Based on this research technical route,the main research contents and achievements of this dissertation are as follows:(1)In view of the large difference in scale of ground objects in remote sensing images,this dissertation proposes a hierarchical context aggregation network for semantic segmentation.By adding a multi-scale feature extraction module to the skip connection of the U-shaped encoder-decoder,it makes up for the lack of explicit multi-scale feature extraction.At the same time,a multi-scale feature aggregation module is embedded in the middle layer of the decoder,which provides a powerful path for the efficient aggregation of multi-scale local contextual features and global features.Finally,the U-shaped network structure is used to aggregate multi-semantic multi-scale features to achieve effective representation of multi-scale objects.Experimental results show that the network can effectively improve the accuracy of semantic segmentation,and achieve the state-ofthe-art performance on two high-resolution remote sensing image semantic segmentation datasets.(2)For the problem of class imbalance,this dissertation proposes a Calibrated Focal Loss and a dual-decoder semantic segmentation network with decision calibration.This dissertation boils down the class imbalance problem as an imbalance between the number of samples that are easy and hard to classify.By quantitatively measuring the prediction confusion of each sample in the prediction results,the loss proposed in this dissertation adds a cross-entropy loss calibration item with the prediction confusion as the weight on the basis of Focal Loss.The addition of this calibration item can force the network model to pay more attention to the confounded classified samples,thereby improving the performance of the model.The dual-decoder network with decision calibration adopts the network structure of dual-decoder with the help of the idea of model integration.By measuring the difference between the predictions of the two decoders,a decision calibration auxiliary loss is added into the loss function to force the model to optimize towards samples with large prediction differences.The experimental results show that both two methods can effectively improve the prediction accuracy of the model.(3)Aiming at the complex and disequilibrium distribution of ground objects,this dissertation proposes a dual-branch adaptive hard region mining semantic segmentation network,which contains a multi-scale semantic branch and an adaptive hard region mining branch.On the basis of constructing the semantic branch,auxiliary prediction results are obtained from different output layers of the backbone network,and the entropy of the prediction results is used to measure the classification uncertainty.Then,a gating mechanism is constructed for hard region mining.This gating mechanism can significantly enhance the features of hard regions and suppress the features of easy regions.Based on this gating mechanism,this dissertation adds a pyramid-structured adaptive hard region mining branch into the semantic branch,which is used to extract the features of hard regions with complex distribution of ground objects.Experimental results show that this method can significantly improve the accuracy of semantic segmentation.(4)Aiming at the problem of blurred object contours,this dissertation proposes an explicit boundary extraction network based on a gating mechanism.Most of the current deep neural networks directly use convolution kernels to learn edge features from the intermediate layers of the backbone network.However,these methods have a certain blindness and relies heavily on the edge supervision.In order to eliminate this blindness,this dissertation proposes an explicit boundary extraction gating mechanism based on the prediction confusion map to filter out redundant information except edges in the feature maps.On this basis,this dissertation constructs an explicit edge extraction network.The network can be used as an edge branch of the dual-branch adaptive difficult region mining network to make up for its lack of object boundary representation.The experimental results show that the addition of this edge branch can effectively improve the performance of the model,and finally achieve the state-of-the-art performance. |