Font Size: a A A

Research On Feature Extraction And Fusion Methods Of Infrared And Visible Light Image

Posted on:2024-09-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:J W LiFull Text:PDF
GTID:1528307112450674Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Infrared and visible image fusion aims to combine the infrared modal scene expression and visible light modal scene expression of the same imaging scene,so as to generate a fusion image that is rich in information and can fully express the infrared and visible modal scene.It is widely applied in various fields,including military reconnaissance,remote sensing measurement,security monitoring,agricultural pest detection,and target tracking.Traditional infrared and visible image fusion method designs corresponding fusion rules to adapt to complex scenes through manually designed activity level measurement and feature extraction.Although the fusion effect is good,there is still a problem of information loss.Deep-learning-based infrared and visible image fusion method utilizes the powerful feature expressiveness of neural networks to perform feature extraction,feature fusion,and image reconstruction on the source image,and uses a loss function to constrain the important information of the source image obtained from the fused image during training.The fusion method based on deep learning has achieved relatively ideal results,but there is still much room for improvement in training dataset generation,model construction,and loss function design.Based on the above discussion,this dissertation is to explore image feature extraction and fusion methods for infrared and visible images.The main research contents are summarized as follows:(1)Aiming at the solutions to image distortion,edge blurring,and Gibbs phenomenon in traditional wavelet transform,and the loss of fine image features in Non-Subsampled Shearlet Transform,this dissertation proposes a two-level multi-scale feature decomposition infrared and visible light image fusion method based on Lifting Stationary Wavelet Transform(LSWT)and Non-Subsampled Shearlet Transform(NSST).Firstly,the method uses NSST and LSWT algorithms to decompose the infrared and visible images into the first and second-level multi-scale features of high-frequency and low-frequency information,respectively.Secondly,considering the characteristics of infrared and visible images and the feature expression of high and low-frequency sub-bands,different fusion rules are designed respectively.In the low-frequency part,the Discrete Cosine Transform(DCT)and Local Spatial Frequency(LSF)are introduced,and LSF adaptive weighted fusion rules in the DCT domain are adopted.While in the high-frequency part,combined with human visual characteristics,an improved regional contrast fusion strategy is proposed.Thirdly,the ablation experiment is carried out to verify the rationality and efficiency of the proposed method.The subjective and objective experiments on the public dataset are compared with mainstream infrared and visible image fusion methods.It is verified that the image fused by the LSWT-NSST method has a clear edge,prominent target,good visual perception effect,and overall optimization.(2)Previous deep-learning-based image fusion methods with a single receptive field for feature extraction,GAN-based methods with a single discriminator and other mainstream fusion methods all fail to distinguish the difference information of multi-modal images,resulting in information loss.Therefore,this dissertation proposes a generative adversarial network based on multi-receptive fields feature transfer and deep attention mechanism feature fusion for infrared and visible image fusion.Firstly,the method employs three classical convolution kernels on the cascaded source images to extract the deep features of multi-scale and multi-receptive fields of multi-source images.Secondly,a multi-scale deep attention fusion mechanism is designed,which describes the important representation of multi-level receptive field extraction features from both spatial attention and channel attention directions and integrates them according to the level of attention.Thirdly,this method interacts with the multi-receptive field features in the encoder and the deep layer features in the decoder,enhancing feature transfer while better-achieving feature reuse.Fourthly,a dual discriminator network structure is used to force the generated image to retain both the intensity of the infrared image and the details of the visible light image.The model ablation experiment verifies the necessity and effectiveness of the deep attention mechanism and multi-level convolution kernel feature extraction.The qualitative and quantitative experiments on three public datasets show that the proposed mode has comparable fusion performance in subjective vision and objective index measurement compared to other mainstream fusion methods.(3)Most previous deep-learning-based image fusion methods use a single convolutional kernel to extract deep features and lack of complementary source images for the input of the dual discriminator,which will inevitably result in information loss during feature transfer and adversarial games.Additionally,most infrared and visible image fusion methods are based on grayscale images.Therefore,this dissertation proposes a generative adversarial network based on multi-scale feature transfer and dual discriminators for infrared and color-visible light image fusion.Firstly,the method uses multi-receptive fields to extract the multi-scale and multi-level deep features of multi-modal images on three feature channels.Secondly,the feature interaction module is introduced into the encoder to realize the information interaction and pre-fusion of features between channels.Thirdly,a new gradient penalty term is introduced to strengthen the Lipschitz constraint,so as to improve the training performance and stability of the model.Fourthly,the generative adversarial network with dual discriminators is used to realize better the adversarial game between one generator and two discriminators.The ablation experiment of the model verifies the necessity and effectiveness of the multi-receptive field feature transfer module and the primary and secondary content loss.The qualitative and quantitative experimental analysis is carried out on two public datasets of gray infrared and visible images and a public dataset of infrared and color visible images,which shows that the fusion result of the method has better subjective visual and objective performance,and is superior to other mainstream fusion methods.(4)Most mainstream image fusion methods usually only conduct subjective and objective verification experiments.To verify the facilitation of image fusion for subsequent target detection tasks,this dissertation designs an experiment for target detection.Based on the LLVIP infrared and visible image detection dataset,a comparative experiment for pedestrian and vehicle target detection is designed for the three fusion methods proposed in this dissertation and the four mainstream fusion methods in the current field.Through qualitative and quantitative experimental detection and analysis,it is verified that the three fusion methods proposed in this dissertation can effectively improve the accuracy of pedestrian and vehicle detection and promote subsequent computer high-level vision tasks,and the adaptability of the three fusion methods is analyzed.
Keywords/Search Tags:Computer Vision, Infrared light, Visible light, Image Fusion, Feature Extraction, Object Detection
PDF Full Text Request
Related items