| The main task of Earth remote sensing satellites is to acquire accurate remote sensing images to extract rich information about the Earth’s surface.With the advancement of remote sensing technology,high-precision remote sensing images containing massive amounts of information on different features are acquired by Earth-derived satellites and are widely used in various fields,such as geological mapping,agricultural remote sensing,environmental monitoring,etc.However,due to the physical limitations of satellite sensors,the same satellite sensor cannot acquire remote sensing images with both spatial and spectral resolution.Most satellites acquire multispectral images(MS)with high spectral resolution and panchromatic images(PAN)with high spatial resolution by carrying multiple sensors respectively.Therefore,how to fully fuse multi-source remote sensing data to acquire high spatial resolution multispectral images has become a hot issue in the field of remote sensing image processing that needs to be solved urgently.This paper addresses the problem of MS and PAN image fusion,and under the framework of deep learning methods,takes the full exploitation of the intrinsic spatial and spectral information contained in multi-source images as the starting point,introduces new theories and methods in the fields of asymmetric convolution,differential ideas,large kernel attention mechanism,etc.,and explores methods to solve problems such as the full exploitation of remote sensing image information and the exploration of optimal fusion architectures.The main research works are as follows:(1)Aiming at the problem that remote sensing image fusion algorithms based on multi-resolution analysis in traditional methods are prone to spatial distortion while generating high-resolution MS images with good spectral consistency maintenance,a MS and PAN image fusion method based on a novel multi-level residual network with detail injection is proposed.Firstly,in order to alleviate the spatial distortion caused by the multi-resolution analysis method,a difference strategy is combined with the multi-resolution analysis method and the corresponding injection coefficients are obtained using a multi-scale residual module to effectively map the spatial information to each band of the multi-spectral image.Secondly,an enhanced version of the residual network(Res Net)is proposed for feature extraction of the model,combining the asymmetric convolution block(ACB)with the residual network to obtain more robust features and enhance the effect of local key points.In addition,an Inception feature pyramid is designed to enrich the spatial information of the fusion results while fusing features at different levels.(2)To address the problem that the fusion algorithm based on the transformer architecture does not pay enough attention to the information in the channel dimension,a local-global high-resolution null-spectral representation network is proposed to fully fuse the local and global spatial-spectral information at different scales.The network is designed as a multi-scale fusion architecture to obtain the scale information of remote sensing images.Also,a local-global feature extraction module is proposed to capture the local and global dependencies in the source image from the spatial and channel perspectives respectively,in order to efficiently learn the spatial texture information and the spectral information.In addition,to obtain more representational information,a multi-scale contextual aggregation module is proposed to weave hierarchical information with high representational power.(3)In response to the problem that the vision transformer(Vi T)based image fusion method not only neglects the adaptation of channel dimensions,but also imposes excessive secondary computational costs for high resolution remote sensing image processing,a lightweight fusion network with a pure convolutional architecture is designed and a multiscale feature enhancement network based on a large kernel attention mechanism is proposed.It aims to achieve adaptability in spatial and channel dimensions at a small computational cost,as well as to efficiently fuse multiple features and semantically enhance deep features.Specifically,we propose ML blocks that combine the advantages of CNN and Vi T,which contain multiscale residual blocks and large kernel attention blocks.The ML blocks enable channel adaptation and acquire multiscale local and long-range spatial features at lower computational cost.Furthermore,the Pyramid Squeezed Attention(PSA)block is used to efficiently fuse multiple features and facilitate the interactions of cross-channel features.Finally,an asymmetric convolutional UNet(ACUNet)is proposed to enhance semantic deep features and fully fuse contextual information. |