A Study Of Multi-modal Image Fusion Algorithm Based On Transformer Models

Posted on:2024-04-27

Degree:Master

Type:Thesis

Country:China

Candidate:Z X Wang

Full Text:PDF

GTID:2568307079473144

Subject:Electronic information

Abstract/Summary:

PDF Full Text Request

Multi-modal images are data acquired by different sensors for the same scene,expressing different contents.The significance of multi-modal image fusion is to perform modal alignment,modal representation,and modal fusion of source images containing different modal information,and to obtain a fused image that is richer in information than any single-modal source image.This thesis focuses on infrared and visible images to build a general network that can be used for multi-modal image fusion tasks by deep learning models.Firstly,this thesis classifies the existing deep learning models applied in this task into explicit and implicit models.The problems faced at different development stages and the corresponding optimization schemes are analyzed respectively.Secondly,the steps of building multi-modal fusion networks are classified,and the problems to be solved in this task as well as how to extract and retain the features of unimodal images in a multiscale manner,and how to reduce the impact of multi-modal image feature variability on the fusion results are considered.Based on the analysis of the above problems,this thesis proposes a Transformer-based cross-modal fusion algorithm.The network designs a feature encoder with CNN and Transformer tandem structure to retain the local and global features of the image and designs a cross-modal attention fusion network to interact and fuse the two modal features.The neural network decoder is designed to reconstruct the fused images.This thesis compares the proposed model with five deep-learning algorithms.The objective metric data show that the proposed model is effective in MI,SSIM,and Q＿CV regarding texture details and global information retention.However,it does perform not well in AG or EI.Based on the above model,this thesis proposes a cross-modal fusion model BIMT based on the bilinear model,designs CNN and Transformer parallel coding network structure,and designs a second-order fusion strategy.This thesis introduces a bilinear attention fusion strategy to do an interactive fusion of local features and global features of multi-modal source images and obtains two first-order fusion features.This result is fed into the cross-modal attention fusion network simultaneously with the two modalities’global features,and the second-order fusion result is obtained.Finally,the final fused features are obtained by splicing in the channel dimension and inputting to the decoder network to reconstruct the fused images.The model BIMT proposed in this thesis is compared with eight SOTA deep learning algorithms,and the experimental results show that BIMT obtains optimal results in the metrics of MI,Q^AB/F and Q_CV,indicating good results in texture detail and information retention.Finally,this thesis proposes optimization suggestions for the pending problems of deep learning models in this research direction and offers an outlook on the application prospects in the field of multi-modal image fusion.

Keywords/Search Tags:

Multi-modal Image Fusion, Deep Learning, Transformer, ViT, Bilinear Model

PDF Full Text Request

Related items

1	Research On Multi-Modal Network Rumor Detection Model Based On Meta-learning
2	Research On Image Caption Method Based On Multi-Feature Fusion And Visual Semantic Adaptation
3	Research On Referring Expression Segmentation Based On Multi-Modal Multi-Scaled Feature Fusion
4	Multi-modal Image Reconstruction And Fusion
5	Research On Multi-focus Image Fusion Based On Unsupervised Deep Learning Model
6	Research And Application Of Image Segmentation Model Based On Machine Learning And Multi-modal Fusion
7	Learning Sparse Representation Model For Thermal And Visible Image Fusion
8	The Research On Multi-Focus Image Fusion Method Based On Deep Learning
9	Research On Visual Perception Technology Based On Multi-modal Fusion
10	Research On 3D Object Detection Algorithm Based On Deep Learning