| Hyperspectral remote sensing technology is a passive remote sensing technology that obtains information on the earth’s surface.Hyperspectral images can be captured by hyperspectral cameras on various platforms such as satellites,aircrafts,and drones.Hyperspectral image classification has always been an important task and has attracted much attention in the field of remote sensing.In recent years,with the development of deep learning,methods based on deep neural networks have gradually dominated hyperspectral image classification.However,there are still several unsolved problems in hyperspectral classification.Firstly,the feature extraction process based on the Convolutional Neural Network(CNN)is not robust to the image rotation,and the features would change with the image rotation.Secondly,deep neural networks based on Transformers ignore the discriminative differences in local regions for category information when processing serialized images,resulting in redundant information.Thirdly,when processing hyperspectral images,Transformer networks still follow a similar approach to conventional visible light images and do not fully utilize the spectral information present in hyperspectral images.Aiming at the above problems,this dissertation conducts a series of research on the potential of the attention mechanism in deep neural networks,including the following three aspects.(1)Cross-Attention Spectral–Spatial Network.In current methods based on convolutional neural networks,2D or 3D convolution inevitably becomes the basic operation for extracting spatial or spectral-spatial features.However,these convolutional operations are sensitive to image rotation.The extracted features would probably change as the input image rotates.To this end,a Cross-Attention Spectral–Spatial Network(CASSN)is proposed in this dissertation.In the CASSN,a crossspatial attention component is proposed to generate rotation-invariant image features.For the rich spectral information of hyperspectral images,a cross-spectral attention component is proposed to suppress redundant information in spectral bands during the feature extraction process and focus on key spectral information related to classification tasks.Experiments on multiple remote sensing hyperspectral classification datasets verified the effectiveness of the proposed cross-spatial attention component and crossspectral attention component.(2)Local Selection Vision Transformer.In the past mainstream remote sensing classification methods based on convolutional neural networks,it is necessary to stack multiple convolutional layers to increase the receptive field to extract global features and improve the discriminative expression of the network.However,multiple convolutional layers would introduce a large number of additional parameters.Considering that the Vision Transformer(Vi T)has shown powerful performance in modeling global sample relationships,Local Selection Vision Transformer(LSVi T)network is proposed in this dissertation.A local selection module is proposed in this network,which introduces attention mechanism to add different attention weights to serialized image patches.As a consequence,the patches from image regions with high correlations to image category information are selected.Those regions with weak correlations with category information are suppressed in features.Through the experimental verification on a large number of remote sensing image data sets,it shows that the proposed LSVi T is able to effectively learn the global feature relationship of large-scale images.The performance of LSVi T is comparable with state-of-the-art methods.(3)Multi-Level Attention based Hybrid Network.Transformer has achieved excellent achievements in image analysis thanks to its powerful global sample correlation ability.However,restricted by its network input form,the image has to be transferred into a series of image patches.In large-scale remote sensing images,this division of image patches makes it difficult to effectively extract features in terms of small-scale land cover.Considering the advantage of convolutional neural network in local information extraction,a Multi-Level Attention based Hybrid Network(MAHN)is proposed in this dissertation.The MAHN combines the respective advantages of convolutional neural network and Transformer in feature extraction,and simultaneously mines local-global information of large-scale complex remote sensing images.At the same time,a multi-level attention mechanism is introduced at different scales from shallow to deep,and convolutional features are used to guide Transformer’s self-attention module to focus on key details in the image area.A large number of experiments on the hyperspectral image classification data sets have verified the effectiveness of the MAHN,showing that the network has achieved state-of-the-art performance in classification accuracy. |