Multi-Source Image Fusion Based On Deep Learning | Posted on:2023-11-17 | Degree:Doctor | Type:Dissertation | Country:China | Candidate:Q H Han | Full Text:PDF | GTID:1528306917979959 | Subject:Pattern Recognition and Intelligent Systems | Abstract/Summary: | PDF Full Text Request | With advances in the sensor technology,multi-source images have been easily obtained and widely used in security,medical imaging,computer vision,and autonomous driving.The multi-source images brought by multi-sensors contain rich scene information and provide in-formation sources for some properties such as depth and occlusion that is difficult to obtain in the single source image.In addition,the complementarity of multi-source images also provides sufficient information support for image quality enhancement and image super-resolution.Therefore,multi-source images contribute to the generation of high-quality im-ages or extraction of hidden features,which is further used for analysis and understanding of the scene.However,a series of problems need to be solved in multi-source image fusion.First,in the case of multi-modal images,the key issue is to deal with the inconsistency in data distribution and brightness between paired images.Traditional single-modal methods have a limit of being directly applied to them,thus leading to poor performance.Second,in the case of multi-view images,it is required to find out the correspondence between them.To solve these problems,we design several fusion methods on the basis of analyzing the data distribution of multi-source image to selectively fuse the source image pair for a fusion image combining the advantages of the source images.In particular,according to the feature characteristics of cross-spectral images in deep learning,we design some methods to bridge the intensity gap between multi-spectral images for image fusion and depth estimation.More details are as follows:1.Although light field images record rich scene information which are directions and inten-sity of the incoming light rays,disparity estimation is still challenging because of the narrow baseline among micro-lenses.To deal with this problem,we propose guided filtering-based data fusion for light field disparity estimation with L0gradient minimization.Stereo dis-parity produces accuracy disparity edge,while defocus response yields smooth disparity information in homogeneous regions.We fuse stereo disparity and defocus response from light filed data in a guided filtering framework with L0gradient minimization.Experimental results on both synthetic and real light field datasets show that the proposed method achieves clearer edges and less errors in disparity than state-of-the-arts.2.In low light condition,visible images are of a low dynamic range with severe noise and color,while near-infrared(NIR)images contain clear textures without noise and color.Existing image fusion methods are afflicted with the low contrast in visible images and flash-like effect in NIR images.Therefore,we adopt unsupervised U-Net to achieve deep selective fusion of multiple scale features.First,we utilize pretrained VGG to extract features from visible images.Second,we build an encoding network to obtain edge information from NIR images.Finally,we combine all features and feed them into a decoding network for fusion.Experimental results demonstrate that the proposed fusion network produces visually pleasing results with fine details,little noise and natural color as well as it is superior to state-of-the-art methods in terms of visual quality and quantitative measurements.3.Spectral gap existed between visible and NIR images causes their dissimilar intensity due to the reflection property of objects and materials.Therefore,it has a limit of apply-ing traditional stereo matching to cross spectral disparity estimation.To solve this problem,we propose cross spectral disparity estimation from visible and NIR paired images using disentangled representation based on a novel reversible structure.The reversible structure decomposes features into scene and style components to bridge the spectral gap between vis-ible and NIR images.We perform stereo matching on the scene component to get an initial disparity map by a 3D convolutional neural network.To generate clear edges in the disparity map,we use a semantic segmentation network as auxiliary information to refine the initial disparity map.Experimental results demonstrate that the proposed method achieves accu-rate edges in disparity along object boundaries and outperforms the state-of-the-art methods in both visual comparison and quantitative measurements.4.The reference picture resampling(RPR)functionality in versatile video coding(VVC)saves bits and reduces complexity by downsampling the input video before encoding and upsampling the decoded video in the decoder side.The RPR functionality can effectively re-duce the transmission bandwidth,thus causing the degradation of video quality.Therefore,we propose a convolutional neural network(CNN)filter for RPR-based super-resolution guided by partition information.We propose a reference spatial attention block(RSAB)for blocking artifact removal based on the partition information generated from the decoded frame.To make full use of correlation between luma and chroma,we use a U-Net back-bone that extracts and fuses multi-scales features from an image.In the U-Net backbone,we design a dilated convolution-based dense block with channel attention.The proposed CN-N filter achieves{-9.25%,8.82%,-16.39%}and{-4.67%,-1.75%,-11.70%}BD-rate changes under all intra(AI)and random access(RA)configurations,respectively. | Keywords/Search Tags: | Light field, disparity estimation, near-infrared image, image fusion, cross-spectral image, stereo-matching, reference picture resampling, super-resolution | PDF Full Text Request | Related items |
| |
|