Font Size: a A A

Research And Application On Representation Learning Based Image Fusion Algorithm

Posted on:2022-03-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:H LiFull Text:PDF
GTID:1488306725951549Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Information fusion technique(especially multi-modal information fusion technique)has been widely applied into many fields and affects people's daily life,such as military and national defense,civilian security and construction of smart city.With the rise of these fields,there are a lot of new visual data.From one modality visual information(RGB image)to the infrared(thermal,near)image,even the depth image(depth camera),hyperspectral image(satellite)and medical image(medical equipment),these multi-modal data show the urgent need for information fusion technique in real application scenarios.Thus,the multi-modal information fusion technique still is a popular research topic.It is very important to combine,process and utilize multi-modal information obtained from different vision devices.In this thesis,the multi-modal information fusion mainly refers to the multi-modal image fusion.The main purpose of image fusion is to generate a single composite image which contains more complementary features and reduces redundant information from multi-modal source images.Ideally,the fusion algorithm can preserve more useful information into the final fused image,which is benefit for the down-stream computer vision tasks,such as object tracking,semantic segmentation and salience detection.After 30 years development,in image fusion field,a lot of algorithms are proposed,including feature extraction algorithms,fusion strategies and fusion models.In each milestone of computer vision development,the image fusion task is still attracted much attention.In recent years,with the rise of machine learning,the representation learning technique is attached attention by many researchers.From the traditional image fusion methods(multi-scale transform,sparse/low-rank representation)to the hybrid fusion algorithms(Morphological Analysis,Pulse-coupled neural networks etc.),which reflects the continued research enthusiasm for image fusion task.Moreover,with the learning mechanism(deep learning),it injects new vitality into the image fusion research.And the design complexities of feature extraction and fusion strategy can be reduced.Thus,we can easily design the network architecture for the specific image fusion task.For multi-scale transform based fusion algorithms,they are all based on signal processing techniques(such as wavelet,shearlet,contourlet).Firstly,the aim of these methods is to transform the source data into frequency domain and obtain multi-scale features.Then,with appropriate fusion strategy and the inverse transform,the final fused images are generated.On the contrary,the sparse/low-rank representation(SR/LRR)based fusion models directly process the source data in spatial domain without any lose of information in transform processing.In SR/LRR based methods,the source images are divided into several image patches and regrouped into a new sample matrix.Then,the SR/LRR are utilized to calculate the coefficients of sample matrix.With appropriate fusion strategy for coefficients and the reconstruction model,the final fused images can be obtained.For deep learning based fusion models,they can be categorized into three classes:(1)Pretrained network based framework;(2)Auto-encoder based framework;(3)End-to-end network based fusion framework.In pre-trained network based framework,the pre-trained neural networks which replace the traditional feature extraction processing are utilized to extract the deep features from source images.Then,appropriate fusion strategies are designed to calculate the fusion weights based on the extracted deep features.Finally,the fused images are generated by the weights and the source images.For the auto-encoder based fusion model,the encoder and the decoder are utilized to extract features and generate the fused image.The main difference between the pre-trained network based fusion framework and the auto-encoder based fusion model is that the architecture can be designed for a specific fusion task in auto-encoder based fusion model.In the end-to-end network based fusion framework,all the image fusion processes are replaced by the specific designed fusion network.With the appropriate architecture and the designed loss functions,it can achieve better fusion performance.In this thesis,we explore the image fusion theory from the above directions.Firstly,we analyze the drawbacks of SR in image fusion field and improve the SR based fusion methods which achieves better fusion performance.Furthermore,according to the main drawbacks of deep learning based fusion methods,we propose several schemes to achieve better fusion performance in this direction.The main contributions include:(1)Low-rank representation(LRR)and dictionary learning(DL)based image fusion methods are proposed.To the best of my knowledge,this is the first time that low-rank representation is applied into image fusion field.Firstly,we combine the DL and the LRR to extract the local and global features from source images.The histograms of oriented gradients(HOG)is utilized to classify the image patches which are divided from source images,and a global dictionary can be learned by these patches.With the global dictionary and LRR,the fused image can be obtained.Furthermore,based on the LRR fusion model,a multi-level decomposition and latent low-rank representation(Lat LRR)based fusion algorithm is proposed,in which the Lat LRR is generalized into a deep version and obtains better fusion performance.(2)The pre-trained neural network based fusion methods are proposed.This is also the first time that the pre-trained large networks are applied into image fusion tasks.Due to the insufficient training data in image fusion field,in early stage,it is difficult to train a specific deep neural network for fusion tasks.Thus,we propose a pre-trained VGG-19 and multi-layer deep features based fusion framework.The VGG-19(trained on Image Net) is utilized to extract the multi-level deep features from source images,the fusion weights of each source image are obtained by several appropriate fusion strategies.Finally,the fused images are reconstructed by the fusion weights and source images.Moreover,we also apply the Res Net-50 into image fusion field and the zero-phase component analysis (ZCA)is used to purge the deep features.The fusion performance is improved by above frameworks,and a new research direction of image fusion is provided.(3)The auto-encoder based fusion models are proposed.In these models,we attempt to explore more possibilities between deep learning and image fusion under a limit condition (insufficient training data).Firstly,an auto-encoder network is designed to reconstruct the input.The encoder is used to extract deep features and the decoder is utilized to reconstruct the input.In this training framework,we do not need a specific training data to train the auto-encoder,which can partially solve the main problem in image fusion (insufficient training data).In testing phase,for different fusion tasks,we can apply different fusion strategies to obtain the fused image.These fusion models show powerful flexibility and expansibility,and a lot of researchers pay more and more attention on this direction.(4)The end-to-end network based fusion frameworks are proposed.Although the auto-encoder based fusion models achieve better fusion performance,the fusion strategy is still designed manually.Thanks to the large multi-modal datasets,we can study on the end-to-end fusion network.To reduce the artificial influence in auto-encoder based models,we propose a learnable network to replace the fusion strategy.With the loss functions and the multi-modal training dataset,we obtain better fusion performance and better quality of fused images.Furthermore,a novel LRR based network architecture is proposed and a multi-level loss function is introduced to maintain the salient information into fused images.These models are also easy to combine with other computer vision tasks and improve the performance.
Keywords/Search Tags:Image fusion, Representation learning, Deep learning, Sparse/Low-Rank representation, Neural network
PDF Full Text Request
Related items