Font Size: a A A

A Study Of Multi-modal Image Fusion Algorithm Based On Transformer Models

Posted on:2024-04-27Degree:MasterType:Thesis
Country:ChinaCandidate:Z X WangFull Text:PDF
GTID:2568307079473144Subject:Electronic information
Abstract/Summary:PDF Full Text Request
Multi-modal images are data acquired by different sensors for the same scene,expressing different contents.The significance of multi-modal image fusion is to perform modal alignment,modal representation,and modal fusion of source images containing different modal information,and to obtain a fused image that is richer in information than any single-modal source image.This thesis focuses on infrared and visible images to build a general network that can be used for multi-modal image fusion tasks by deep learning models.Firstly,this thesis classifies the existing deep learning models applied in this task into explicit and implicit models.The problems faced at different development stages and the corresponding optimization schemes are analyzed respectively.Secondly,the steps of building multi-modal fusion networks are classified,and the problems to be solved in this task as well as how to extract and retain the features of unimodal images in a multiscale manner,and how to reduce the impact of multi-modal image feature variability on the fusion results are considered.Based on the analysis of the above problems,this thesis proposes a Transformer-based cross-modal fusion algorithm.The network designs a feature encoder with CNN and Transformer tandem structure to retain the local and global features of the image and designs a cross-modal attention fusion network to interact and fuse the two modal features.The neural network decoder is designed to reconstruct the fused images.This thesis compares the proposed model with five deep-learning algorithms.The objective metric data show that the proposed model is effective in MI,SSIM,and Q_CV regarding texture details and global information retention.However,it does perform not well in AG or EI.Based on the above model,this thesis proposes a cross-modal fusion model BIMT based on the bilinear model,designs CNN and Transformer parallel coding network structure,and designs a second-order fusion strategy.This thesis introduces a bilinear attention fusion strategy to do an interactive fusion of local features and global features of multi-modal source images and obtains two first-order fusion features.This result is fed into the cross-modal attention fusion network simultaneously with the two modalities’global features,and the second-order fusion result is obtained.Finally,the final fused features are obtained by splicing in the channel dimension and inputting to the decoder network to reconstruct the fused images.The model BIMT proposed in this thesis is compared with eight SOTA deep learning algorithms,and the experimental results show that BIMT obtains optimal results in the metrics of MI,QAB/F and QCV,indicating good results in texture detail and information retention.Finally,this thesis proposes optimization suggestions for the pending problems of deep learning models in this research direction and offers an outlook on the application prospects in the field of multi-modal image fusion.
Keywords/Search Tags:Multi-modal Image Fusion, Deep Learning, Transformer, ViT, Bilinear Model
PDF Full Text Request
Related items