With the rapid development of remote sensing technology,the aerial observation platform can shoot remote sensing images with a spatial resolution of submeter level,and deep neural networks have been widely used and studied in remote sensing image scene classification,and have achieved great success.At present,most deep learning methods for scene classification require labeled data for training,and labeled data requires relevant professionals to label,which is time-consuming and laborious and the number of labeled samples is scarce,which is not enough to support the pace of scene classification of remote sensing images with rapid development.For the traditional deep learning network models,which are currently popular,complex calculations are often required and need to occupy a large amount of memory space,while the deployment of deep learning models to resource-constrained devices such as embedded devices,drones,and mobile devices needs to consider factors such as performance and speed.Convolution neural network(CNN)is currently the most popular scene classification network model,but more and more studies show that methods based on convolutional neural networks try to explore global features by gradually expanding the acceptance domain,while ignoring remote context information.The newly launched Vision Transformer(ViT)network model can extract context features,but its ability to learn local information is limited,and when a larger resolution image is input,the amount of ViT computation will increase sharply.In order to solve the above problems,this paper mainly studies semi-supervised learning methods,model lightweight and image feature extraction,and the main contributions are as follows:(1)This paper proposes a scene classification algorithm of Vision Transformer model based on Semi-Supervised Learning(SSL),and uses an improved FixMatch method to learn labeled data and unlabeled data.Pyramid Vision Transformer(PvT)network model is used to replace the original CNN model.Since FixMatch is mainly a method for training unlabeled Data,a data Augmented(LDA)strategy was proposed for Labeled data sets,Labeled Data Augmented(LDA)could maximize the accuracy of the algorithm without increasing the number of parameters and calculation.The PVT model has better feature extraction capabilities,and the Transformer architecture is better able to capture global information and use pyramid modules to extract multi-scale features.(2)Research and design a lightweight network model scene classification algorithm based on semi-supervised learning.Based on the semi-supervised learning method and lightweight network model,the FadgeNet model is proposed,which considers the label information and channel information of remote sensing images.The EdgeNeXt network model based on the improved FixMatch method is adopted.The model effectively combines the advantages of CNN and ViT,and significantly reduces the number of parameters compared with the ViT model.Through the effective combination of convolutional encoder and SDTA encoder,the local information is extracted while the global information is fully considered.Moreover,Channel Attention is added into the EdgeNeXt network model,and the weights are set according to the importance of local information.(3)A lightweight scenario classification prototype system based on semi-supervised learning is designed and implemented,and the scenario classification method proposed in this paper is compared on UCM and NWPU data sets. |