| Remote sensing scene classification is a significant research area in remote sensing image information processing.According to the different semantic features contained in remote sensing images,it is to divide the images containing scenes into discrete and meaningful land use and land cover.Remote sensing scene classification provides a fundamental method to solve problems in a wide range of practical applications.Therefore,it is widely used in geospatial object detection,geographic image retrieval,precision agricultural analysis,natural disaster detection and other fields.With the development of remote sensing satellite technology and various social media and mapping software,it becomes easier for us to obtain geo-tagged data from various sources.Compared with the data from a single data source,these multi-view(multi-source,multimodal and multi-perspective)data can provide more useful information.Therefore,multi-view data is increasingly used in various tasks of remote sensing.However,the use of multi-source data also brings about an increase in sample noise,namely the incompatibility between the visual content of the image and its semantic label.Thus,multi-view data cannot play its potential advantages.Although the current model based on deep learning can adaptively learn the weight of data,the lack of quantitative research on the credibility of multi-view data makes these deep learning models less interpretable,which affects their performance and flexibility in downstream remote sensing tasks.On the aerial-ground dual-view image,because the aerial image captures a large range of scenes covering a lot of content,and the buildings of the same class have a variety of styles,it will have a large number of samples with intra-class differences and inter-class similarities.Meanwhile,if the shotting scene is too close,it will lead to problems such as large target and serious occlusion in the ground image.Therefore,a large number of low-quality images lacking striking category information will be generated.To solve these problems,this paper introduces the theory of evidential deep learning.And then subjective logic is used to discuss how to quantify the credibility of multi-view data,so that the multi-view remote sensing image scene classification can be better performed.The primary contributions of this paper are as follows.(1)An evidential-based sample uncertainty quantification framework is proposed to measure the decision risk in the fusion of aerial-ground dual-view images.According to the principle of evidential deep learning,uncertainty estimation is introduced into deep learning to calculate the subjective opinions of each view.In this subjective opinion,not only can the classification results of each view be obtained,but also an uncertainty value can be obtained to evaluate the credibility of the view.The experimental results on two datasets show that the more concentrated the evidence is,the higher the degree of confidence is.On the contrary,the more dispersed the evidence is,the lower the degree of confidence is.Therefore,it can describe the decision risk of the view.(2)A credible fusion strategy for scene classification of aerial-ground dual-view images is proposed.Current decision level fusion strategies do not consider the risk of multi-view data quality on final fusion decisions.The strategy proposed in this paper introduces the uncertainty quantification method for each view into decision fusion.The uncertainty in subjective opinions is used to assign different weights to different views.The strategy will assign more weight to the view with low decision risk and less weight to the view with high decision risk.The final result will depend more on the views with lower decision risk.By using several commonly used deep learning models to validate on two datasets,the credible fusion has a significant improvement in performance compared to other fusion methods.Therefore it can be effectively applied to remote sensing scene classification for air-ground dual view fusion.(3)A loss function namely positive and negative class reciprocal loss is designed.The uncertainty quantification is added to the loss function which can constrain the uncertainty of each view.In addition,the loss will also constrain the uncertainty generated after fusion.It can be used not only to train an end-to-end aerial-ground dual-view remote sensing scene classification model,but also to train a fusion classifier without feature extraction,which also makes the proposed strategy more flexible.Experiments show that using the proposed loss function for training can effectively improve performance. |