| With the continuous progress of remote sensing image technology,a large number of remote sensing images have been applied to smart agriculture,urban road planning,key target positioning and many other fields.Remote sensing images are affected by shooting angle and shooting time,and there will be texture and shape differences caused by inconsistent angles in the same scene.At the same time,due to the impact of shooting time and natural environmental factors such as illumination,the color of ground objects of the same type is quite different.The variation of distance between the shooting sensor and the measured ground object leads to the scale diversity of remote sensing images obtained.As a result,remote sensing image data sets have the problems of large difference within classes and high similarity between classes,which makes it difficult to extract deep and characteristic semantic information from remote sensing images.Therefore,aiming at the problems of large intra-class difference and high inter-class similarity existing in remote sensing data sets,Mutil-scale Attention CNN was proposed based on scale feature fusion of Attention network(MSA-CNN)and mutilscale Stacking Attention Pooling CNN(MSAP-CNN).Based on remote sensing data sets within the class of differences and similarities between class of problems,the main research contents and results are as follows:(1)Aiming at the problem that the classification accuracy of remote sensing image is not high due to the characteristics of large intra-class difference and high similarity between classes in remote sensing data sets,a remote sensing image scene classification model MSA-CNN based on attention network scale feature fusion is proposed.Firstly,the remote sensing images were preprocessed and input into VGG-16 network to extract multi-scale features of remote sensing images,and the multi-selected box attention model MS-APN was used to extract multi-scale target regions of images.The target regions were cut and amplified and input into the three-layer network structure.Then,the multi-scale features of the original image and the features of the target region are fused and input into the local binary model to extract texture features of remote sensing images of different scales to enhance the model’s ability to recognize information in different directions.Finally,texture features at different scales are fused and input to the network full connection layer to complete the final classification and prediction task.Experimental results show that the average classification accuracy of MSA-CNN is1.63% and 2.66% higher than ARCNet and RA-CNN in NWPU-RESISC45 open data set,respectively.In UC Merced land-use open data set,it is improved by 0.64%compared with RA-CNN.The results show that MSA-CNN proposed for the scale and Angle difference of remote sensing images can effectively improve the accuracy of remote sensing image scene classification.(2)Aiming at sensing image target area for the object’s richness and similarity,and the coexistence of multiple object classification and relate to each other and lead to difficult problems,A remote sensing scene classification model based on multiple instance learning scale feature fusion is proposed,which is called MSAP-CNN.MSAPCNN is firstly used to extract multi-scale convolutional feature images of remote sensing images through a set of multi-scale convolutional feature extractors,and then a set of convolutional feature images obtained are stacked together and input into the sample classifier.The example classifier consists of N 1x1 convolution filters(where N equals the number of scene categories)to compute the feature vectors of the example.Finally,we use the multiple instance learning pooling to obtain the bag-level category probability,and use the classical cross entropy loss function to minimize the loss between bag-level prediction and real tag,so that the whole framework is optimized in the forward and back propagation process.The experimental results show that MSAPCNN is 0.21% and 1.19% higher than MSA-CNN on UC Merced and NWPURESISC45 data sets,respectively.The MSAP-CNN model based on multi-example learning pooling can effectively mine the internal relations between example and example and between example and label in the image,so as to further improve the classification accuracy of complex scene categories. |