| With the development of modern medical imaging techniques,different modalities of imaging have arisen.Single-modality medical images are limited in their ability to characterise detail.To address the limitations of unimodal images,fusion of medical images from multiple modalities can be used in clinical practice to compensate for the deficiencies of single-modality medical images while preserving the features of the original image,displaying a abundance of detailed information and facilitating accurate diagnosis and treatment of diseases.We have identified the problems of multimodal medical image fusion,and improved them as follows:To address the problem that existing image fusion methods extract image features only at a single scale,resulting in the loss of some features at other scales of medical images,which affects the deep detail representation and image clarity of fused images,we propose a improved U-Net3+ and Cross-modal attention blocks with Dual Discriminator Generation Adversative Network(UC-DDGAN)for multimodal medical image fusion.UC-DDGAN contains a generator and two discriminators,with the generator including feature extraction and feature fusion.The feature extraction part embeds cross-modal attention blocks into the path of U-Net3+ downsampling to extract deep features,which facilitates the extraction of cross-modal features and deep features alternatively.Finally,the feature extraction part obtains a composite feature map for each layer.Next,these extracted composite feature maps are up-sampled after channel concatenation to output feature maps containing full-scale deep features for both modalities.The feature fusion part uses the Concat layer to perform channel concatenation and convolution operations on the two-modal full-scale feature maps to obtain the fused images.The dual discriminators perform targeted discriminations on source images with different distributions respectively.The loss function introduces gradient loss,which is weighted with pixel loss to optimise the generator.To address the problem of requiring a large number of datasets for training due to UCDDGAN with a complex structure,this paper proposes a multimodal medical image-fusion Generation Adversative Network based on Knowledge Distillation and Explainable AI modules(KDE-GAN).KDE-GAN uses UC-DDGAN as the baseline model,improves the generator using knowledge distillation and introduces explainable AI modules to the discriminator.The knowledge distillation improves the generator by first using the U-Net3+with cross-modal attention blocks as the teacher network,then refer to the binarised output of each cross-modal attention block to reconstruct the student network,and finally use the output of the teacher network to guide the student network training.The explainable part first uses the explainable AI modules in the discriminator to obtain explainable images about the results,and then judges the discriminator performance based on the explainable images and dynamically decides whether the discriminator needs to be further optimised.The student network,obtained by knowledge distillation,replaces the feature extraction part of the generator of the baseline model and,in conjunction with the discriminator introduced into the explainable AI modules,ensures that even when trained with small datasets,medical fusion images with clear deep details are still obtained.Based on the two image fusion methods proposed in this paper,we have developed a multimodal medical image fusion system on the Py Charm platform.The system contains three parts: the login module,the home module and the fusion module.The fusion module enables the input of a pair of aligned CT-MR images and the output of the subjective visual and objective index results of their respective fused images.The system can be used clinically to help doctors understand the results of subjective and objective evaluation of fused images and facilitate the selection of medical fusion images with better results for disease diagnosis. |