High resolution remote sensing image scene classification is an important application of remote sensing technology.In recent years,it has played an increasingly important role in many fields such as natural hazards detection,environment monitoring,and urban planning.Feature extraction is a key step in scene classification.The current deep learning methods represented by convolutional neural network can automatically extract image features and use a large number of samples for end-to-end training and prediction,and the classification performance has been significantly improved compared with the traditional scene classification methods.However,existing deep scene classification methods mostly obtain the image semantics of discriminating scene categories by hierarchical convolution,and the extracted feature information is single,and cannot focus on the saliency features or key areas of remote sensing images.In this paper,the feature enhancement method used in scene classification is studied,and three kinds of remote sensing image scene classification methods based on feature enhancement are designed.The main research contents and results of this paper are as follows:1.Aiming at the problems of various types of ground objects,complex spatial layout and large-scale transformation of the key objects,a saliency feature fusion enhanced network is designed for scene classification.Firstly,use an independent spatial attention layer at the lower level of the backbone network to highlight the information of key object regions,and use an independent channel attention layer at the upper level of the backbone network to enhance the feature response of key object.Moreover,multi-scale feature maps of different depths are fused to improve the representation ability of the features used for classification.Finally,an auxiliary classifier which is only used in the training stage is used to increase the gradient information of back propagation and provide a regularization effect.Because this method effectively improves the salience feature representation of remote sensing images,compared with other scene classification methods,the overall accuracy on UC Merced dataset is increased by 0.10%~6.25%,and on AID dataset is increased by 1.29%~10.35%.2.Aiming at the problem of insufficient context information of feature extracted from the high-level convolution layer of convolutional neural network,a high-level feature interaction enhanced network is designed for scene classification.Firstly,the backbone network is used to extract the initial feature representation of remote sensing images,and the spatial interaction branch built by the improved Transformer model is used to enhance the initial feature representation from the spatial perspective.At the same time,the channel interaction branch composed of local cross-channel attention without dimensionality reduction is used to enhance the feature from the channel perspective.Then,the enhanced features obtained from the two branches are fused to obtain rich features for scene classification.Finally,label smoothing regularization is used to further improve the generalization ability of the network.Because this method uses spatial interaction branch to introduce context information to the high-level feature map,and uses channel interaction branch to adaptively enhance beneficial features,it can increase the effectiveness of feature for scene classification.Compared with other scene classification methods,the overall accuracy of this method is increased by3.18%~10.59% on RSSCN7 dataset and increased by 1.23% ~10.29% on AID dataset.3.Aiming at the problem of insufficient semantic information of feature extracted from the low-level convolution layer of convolutional neural network,a scene classification method based on cross-scale semantic enhancement is designed.First,the top-down non-local interaction method is used to integrate the semantic information of the high-level feature map into the low-level feature map to enhance the semantics of the low-level feature map.Then the enhanced low-level feature map is fused with the high-level feature map to obtain the features with richer context information for classification.Next,a residual self-attention auxiliary branch is designed to be used only during the training stage,and using the output of this auxiliary branch for classification can provide additional regularization.Finally,label smoothing regularization is used to further improve the generalization ability of the network.Since this method effectively enhances the semantic information of the low-level feature map used for fusion,compared with other scene classification methods,the overall accuracy of the proposed method is increased by 3.32%~10.73% on RSSCN7 dataset and increased by 1.45%~10.51% on AID dataset and increased by 0.25%~15.22% on NWPU-RESISC45 dataset. |