| As an important data source for Earth observation,remote sensing images are often used in agriculture,forestry,urban monitoring,and other fields.With the increasing variety of sensors and remote sensing platforms,remote sensing image data is increasing dramatically.Faced with the massive amount of remote sensing image data,deep learning methods have shown powerful feature extraction capabilities,but need to satisfy the prerequisite assumption that the training and test sets have the same distribution.However,the distribution of remote sensing images varies with the conditions of sensors,time and space,and this sensor and spatio-temporal heterogeneity fundamentally break the above assumptions,thus significantly impairing the decoding performance and application of deep learning models.Therefore,how to extract effective features from remote sensing images and achieve reliable interpretation has become an important research topic in the field of remote sensing.Although it is theoretically possible to learn stable invariant representations from a large number of remote sensing images with different distributions to cope with the above-mentioned heterogeneity,each image is only a point in a specific sensor-time-space three-dimensional space.Collecting a sufficiently adequate sample from dynamic,continuous remote sensing data is almost impossible to accomplish in reality.In this regard,if the various possible distributions of remotely sensed images can be flexibly modeled,there is an opportunity to learn robust representations in a cost-effective manner,enabling high-performance interpretation across sensors and space-time.It is shown that the visual distribution of remotely sensed images can be described as styles,i.e.visual differences in images from different sensors and spatio-temporalities can be abstracted as style differences.Based on this,this paper simulates the possible heterogeneity distribution of remote sensing images through depth styles,to learn the invariant representation of visual features of remote sensing images.Specifically,this paper investigates and discusses the temporal and spatial heterogeneity of satellite images and near-field images respectively.The main research results are as follows.(1)To address the problem that the temporal heterogeneity of satellite images degrades the generalizability of the model,a visual feature representation of satellite images based on style invariant loss is proposed.In this paper,we simulate the visual performance of images at different time periods by using random color enhancement method,and borrow the contrast learning method to learn the style invariant representation of images,and introduce the style invariant loss to close the distance between images of the same object but different styles.To verify its effectiveness,the method is tested in several downstream tasks,and the experimental results show that the method has a large improvement in the scene classification task,16.91% higher than random initialization and 16.42%higher than Mo Co v2.However,it did not show significant results on the pixel-level classification task.(2)In response to the fact that the random color enhancement approach cannot well mitigate the impact caused by the temporal heterogeneity of satellite images,a GAN enhancement-based visual feature representation method for satellite images is proposed.In this paper,the GAN method is used to realize the transformation of images in two seasons,increasing the realism of images in terms of temporal variation.The results show that the method has a significant improvement in accuracy on several downstream tasks.Compared with Sim CLR,the method improved accuracy by 6.65% in the scene classification task;OA by 2.72% in the semantic segmentation task;and F1 score by 8.55% in the change detection task.In addition,this paper finds that the method has solved to a certain extent the problem of "same-spectrum and different-spectrum" in remote sensing images,i.e.it has learned the style invariant representation of images.(3)To address the problem of similarity and difference of urban landscapes located in different spaces,a visual feature representation method based on deep style learning is proposed for near-ground images.In this paper,the statistical features of style are used as the criteria for judging urban landscape,and two aspects are analyzed: the similarity of landscape between cities and the type of landscape within cities.The study shows that there is a trend towards greater style similarity between two cities that are geographically close to each other.In addition,the method can help retrieve scenes with similar and distinctive landscapes within cities,which is conducive to the mining of urban landscapes.Figures 36,Tables 12,References 129... |