| As an important application of remote sensing,land cover classification is of great significance for land planning,surface monitoring,disaster relief and so on.The task of land cover classification of remote sensing image aims assigning the corresponding surface category information label to each pixel unit in the remote sensing image data.In the traditional remote sensing image land cover classification technology,limited by the type of sensor,usually use a single sensor to capture the remote sensing image,and classify the single source data(single-modal data)through the existing methods.However,due to the complex climate conditions and the limitations of single sensor imaging principle,running track,frequency,stability and other factors,the use of single-mode remote sensing data for ground object classification has great limitations.Therefore,the use of multi-modal remote sensing images for land cover classification is of great significance to obtain stable results.Data sources in the field of remote sensing are usually divided into optical images and synthetic aperture radar(SAR)images.Optical images are obtained by capturing the spectral information formed by the reflection of sunlight on the earth’s surface through remote sensing sensors.Because this method belongs to passive imaging,it has good imaging effect and high resolution,but it is easy to be affected by climatic conditions such as clouds.SAR image is generated by the remote sensing satellite actively transmitting specific electromagnetic wave signals and receiving the corresponding electromagnetic wave signals reflected from the surface.It belongs to active imaging,but it has some disadvantages,such as low resolution and poor visualization effect.In order to overcome the problems of large imaging conditions and easy interference of optical remote sensing images,SAR data heterogeneous with optical data can be used for ground object classification,which can reduce the negative impact caused by single-mode data damage.Moreover,multimodal data can carry more information conducive to land cover classification,and can also be used as supplementary information to work together with the land cover classification results to improve the accuracy of classification.However,because of the simultaneous interpreting of data sources,the wide range of data distribution,and the sensitivity of data obtained from different sensors to different surface features,the heterogeneity of multi-modal data often affects the utilization of multi-modal data.Based on the above problems,this paper designs a classification method of multimodal remote sensing images based on deep learning network.The main description is as follows.(1)Build multi-modal remote sensing image data set.Because optical images have the advantages of high resolution and good visual effect,they are convenient for land cover classification research.At present,most of the existing remote sensing image land cover classification data sets are mainly optical images.Other land cover classification data sets related to SAR images have low data source resolution and poor annotation effect.The multimodal remote sensing image land cover classification data set is a blank.In order to better study this subject,this paper proposes a multi-modal remote sensing image land cover classification data set based on Gao Fen-2(GF-2)optical satellite and Gao Fen-3(GF-3)satellite.The data set includes three remote sensing images with different surface features and three climate categories under original meteorological conditions.At the same time,the raw data processing of the two satellites and the production method of multimodal data set are also described in detail.In order to test the experimental effect under complex climate conditions,this paper also simulates the optical remote sensing image through different cloud simulation methods,and obtains different cloud coverage effects.(2)A module for feature fusion of multi-modal squeeze and excitation(MSE)in deep learning network is designed.Due to the different sources and forms of multi-modal remote sensing data,there is heterogeneity between different modal data.When using multi-modal data and features,direct data fusion may lead to mutual interference between different modal data.Therefore,based on the compression and excitation technology in deep learning,this paper designs an MSE feature fusion module which is specially used to combine the features of different modes in deep learning network.(3)Dual stream deep high resolution network(DDHRNet)structure is proposed for multimodal data feature extraction and fusion of remote sensing image land cover classification.Different from the traditional depth network,feature extraction is carried out by continuous down sampling of the image.In order to ensure that the features of different modes are always aligned in the fusion operation,and to retain more location information and small details contained in the data,DDHRNet uses the encoder of different modal data based on high-resolution network(HRNet).Besides,in order to perform feature fusion efficiently,DDHRNet has a deep progressive cascade feature fusion structure.The fusion operation of different scale feature maps is carried out at different stages of different modal encoders to obtain the enhanced representation of features conducive to the results of land cover classification.(4)Based on MSE module and transformer technology,multi-modal squeeze and excitation transformer(MSETrans)network structure for multi-modal remote sensing image land cover classification is designed.The structure uses the most advanced transformer technology,and integrates the depth features of different modalities through MSE module.After obtaining the depth features obtained by the transformer structure,the feature is decoded using a lighted multi-layer perceptron(MLP). |