| With the improvement of the resolution of remote sensing images and the development of information extraction technology,the classification of remote sensing image scenes has also changed from pixel-oriented to object-oriented classification.However,traditional methods based on artificial features rely on the prior knowledge of experts,and the extracted mid-and bottom-level features have limited classification capabilities,resulting in poor classification results in the face of high-resolution remote sensing images with high-level semantics.In recent years,convolutional neural network(CNN)has been widely used in scene classification of high-resolution remote sensing images due to its powerful feature selfextraction ability.However,the classification of remote sensing image scenes based on convolutional neural networks still has problems such as relying on a large number of labeled data to train models,feature redundancy,and parameter redundancy.Therefore,this paper proposes to combine transfer learning(TL)for model training,design an improved model based on attention mechanism and depthwise separable convolution(DSC),and apply it to remote sensing image scene classification.The main research work and results of this paper are as follows:(1)Design a special classification layer to solve the problem that the input image size,classification layer and pre-training model are inconsistent and cannot be trained during finetuning training using transfer learning.The pre-training models used are all classification layers designed to adapt to thousands of classification tasks in the Image Net dataset.The design of the classification layer is related to the size of the input image.For 21 types of ground objects with a size of 256 in the UC Merced Land Usage(UCM)dataset used,a classification layer with an unfolding layer,a dropout layer and a two-layer fully connected structure is designed to realize the transfer and use of the parameters of the pre-training model.The loss of input image information is reduced,and the experimental results show that the classification accuracy value can reach 92%-95.71%,which has a good classification accuracy.(2)Convert the used data into tfRecord files,which solves the problem that the used tif format data cannot be decoded by the api normally,and improves the speed of model data extraction.The UCM dataset in tif format does not have a dedicated decoding api in tensorflow 2.0.In this paper,the image 3D information and label information in the image are written into the tf Record file,and the data is decoded through the corresponding decoding api.The tf Record file contains all the information in the data set.Compared with the normal image-by-image decoding method,the efficiency is significantly improved,and it also brings convenience to model training.(3)Online data enhancement is performed on the used UCM dataset to prevent underfitting in model training and reduce memory usage.Offline data augmentation will increase the actual size of the dataset,resulting in memory usage.Considering this problem,this paper uses data online enhancement to randomly rotate,flip,and change the brightness of each batch of input images to increase the amount of training data,and successfully train without changing the amount of data in the folder.Expected training state.(4)An improved model based on the convolution block attention mechanism is designed,which solves the feature redundancy problem caused by indiscriminate feature extraction and improves the efficiency and accuracy of feature extraction.When extracting image features,the convolutional neural network extracts indiscriminately,resulting in some features being repeatedly extracted,which greatly affects the performance of the model.The convolution block attention mechanism assigns weights to channels and spaces,and selectively extracts features,which greatly improves the efficiency of the network model and improves the accuracy by 0.51% compared to the original model.(5)Depthwise separable convolution(DSC)is used to replace traditional convolution,which solves the problem of network model parameter redundancy.Too many parameters do not help to improve the training and classification accuracy of the model,and most of the redundant parameters are concentrated in the convolution kernel.Compared with the traditional convolution kernel,DSC has less parameters.The improved Vgg16 model based on DSC achieves a million-level parameter reduction,reaching a classification accuracy of95.95%,which is 3.13% higher than the original model Vgg16.After the convolution kernel,the model accuracy can also reach 93.10%. |