| With the development of hardware devices,sensor technology is also making rapid progress.Due to the different working principles of different sensors,the captured image content is also different.Therefore,in some scenarios,different sensors will be used to obtain different aspects of information.It is precisely because the scene information fed back by sensors is different,they can provide complementary information among themselves,providing more comprehensive information for computer vision tasks.For example,infrared sensors use the high contrast of infrared images to separate prominent objects from background objects by capturing thermal radiation data in the environment.Visible light sensors can capture light reflection data in the environment,and the resulting visible light images have relatively rich texture information,but they cannot highlight prominent objects in infrared images.In order to obtain complete information about the same scene,it is necessary to fuse images of both infrared and visible modes,remove redundant information,retain complementary information,and obtain a single image with multiple sensor sources for a single scene,thereby providing better visual perception and decision support for people.Traditional infrared and visible light image fusion methods usually use the method of decomposing the image into high-frequency information and low-frequency information.Then,the detailed texture and overall outline of the image in the scene can be obtained through high-frequency information,and the brightness distribution of the image in the scene can be obtained through low-frequency information.Finally,these information can be complementary and fused.However,this approach may lead to issues such as low contrast,texture loss,and artifacts in the fusion results due to not taking into account the differences between the two image features.With the development of deep learning technology,more scholars have begun to study using deep learning technology to process infrared and visible image fusion tasks.Although great progress has been made,there are still some issues that have not been resolved.For example,some deep learning methods do not consider the differences of different modal features when extracting features from two types of images,resulting in the loss of feature information and significant target brightness degradation in the fused image.In addition,the failure to fully utilize the features of each level in the two feature fusion processes may lead to missing or inaccurate information in the fused image.Based on the above issues,this paper aims to make up for the shortcomings of deep learning methods for infrared and visible image fusion,and proposes two new deep fusion networks.The main work of this paper is summarized as follows:(1)Due to the fact that some deep learning methods do not consider the differences in the characteristics of different modal images,this paper proposes an infrared and visible image fusion network based on modal feature attention.The network mainly includes three stages: feature extraction,feature fusion,and fused image reconstruction.In the feature fusion stage,this article constructs an Attention Feature Fusion Module(AFFM),which can promote the effective fusion of multimodal complementary features and improve the fusion performance through multiple iterations.In addition,in order to better monitor the training of the network,this paper proposes a new wavelet decomposition based loss function and saliency loss function to constrain the fusion results of the network,so that the final fusion results include both rich texture information in visible light images and prominent targets in infrared images.Through subjective and objective comparative experiments and ablation experiments on TNO data sets,it is fully proved that the proposed method achieves better fusion results compared to some current advanced methods.(2)Due to some deep learning methods that do not fully consider the differences of different levels of feature information in multimodal images,in order to effectively fuse different levels of information,this paper proposes a dual encoder fusion network for infrared and visible images based on multi-layer feature fusion,which can effectively fuse the features of the source image at different levels.In the network,a deep semantic information fusion module(DSIFM)is proposed to merge deep features of different scales.In addition,considering the differences between infrared and visible light features,a Shallow Middle Information Fusion Module(SMIFM)was designed to integrate the shallow and middle layer features obtained through two encoders with the deep layer features transmitted through the network.In addition,in order to better preserve the salient target information and texture features of the source image through the fusion results,a joint loss function including sensitivity loss and structural similarity loss is defined to monitor the training of the network.Through a large number of subjective and objective experiments conducted on the TNO dataset and the Load Scene dataset,the qualitative and quantitative experimental results obtained confirm the advantages of the proposed method. |