| Image fusion has attracted massive research attention in the field of computer vision,aiming at combining multi-model images in the same scene to generate an image that can describe the scene accurately and comprehensively.Among various source image types,the fusion of infrared and visible image matches with the practical application of public security most.The infrared sensor reflects the target information by capturing the thermal radiation in the scene,and can identify infrared targets such as people and vehicles even under sever conditions like low illumination,occlusion and hiding.However,infrared images usually suffer from low resolution and poor scene texture details.By contrast,visible sensor gathers the reflected light of objects,and visible images are suitable for human visual perception because of their high spatial resolution and considerable texture information,but they are easily affected by rain,fog and other objective environmental factors,so the image quality is unstable.Therefore,the fusion of these two images with complementary information can thus improve the information utilization,and further provide a better guarantee of the prevention and disposal of public security.At present,the vital applications of infrared and visible image fusion technology in public security include the identification,monitoring and warning of latent risks,and the technology gradually serves the police practical warfare departments such as anti-terrorist,public security,anti-drug and border management.In recent years,the advantages of deep learning methods in feature extraction and data representation have become increasingly prominent,and have been widely researched in various computer vision tasks.It is worth mentioning that Convolutional Neural Networks(CNN)and Generative Adversarial Networks(GAN)are two representative deep learning frameworks that have become major players in the research of infrared and visible image fusion.However,most CNN and GAN based fusion methods suffer from insufficient extraction of local information,long-range dependency information and semantic information.To tackle these challenges,this dissertation designs three fusion algorithms based on multi-level information preservation:(1)To tackle the insufficient extraction of local information,a fusion algorithm based on GAN and guided filter(DSG-Fusion)is proposed.DSG-Fusion introduces a guided filter into the design of generator structure to facilitate the generator to extract deeper background information,and add more texture details to the fused results.In addition,considering the model difference of source images,two independent data flows are designed to extract the features respectively,more representative features can thus be learned.Furthermore,two discriminators are employed to encourage the fused image to be close to source images simultaneously,and a DSG loss consists of intensity and structural similarity is designed to constrain the training of the network.The parameter analysis and ablation experiment verify that the proposed guided filter module and double-stream architecture can significantly improve the fusion effect.At the same time,extensive experimental results on two public datasets demonstrate that DSG-Fusion preserves most detail texture and edge information compared with 7 approaches,but the long-range dependency information is not considered adequately in this algorithm.(2)To tackle the insufficient extraction of long-range dependency information,a fusion algorithm based on Transformer(DGLT-Fusion)is proposed.In the network architecture of DGLT-Fusion,the long-term Transformer module and local CNN module are interweavingly stacked and densely connected.Long-range dependency learning and local feature extraction are decoupled into the information processing process of the two modules,the extracted source information can thus be fused more adequately.The effectiveness of introducing Transformer into fusion task and the advantages of decoupled network structure are verified by the parameter analysis and ablation experiment.Furthermore,comparative experiments demonstrate that DGLT-Fusion outperforms other algorithms in global information preservation,but it pays little attention to image semantic information.(3)To tackle the insufficient extraction of semantic information,a fusion algorithm based on semantic perception(SePT)is proposed.SePT extracts local feature and long-range dependency through convolutional neural network(CNN)based and Transformer based modules,and meanwhile designs two semantic modeling modules based on Transformer architecture to manage high-level semantic information.One semantic modeling module maps the shallow features of source images into deep semantic,the other learns the deep semantic information in different receptive fields.The final fused results are recovered from the combination of local feature,long-range dependency and semantic feature.Extensive comparison experiments demonstrate the superiority of SePT in semantic information preservation compared to other advanced fusion approaches.Finally,three fusion algorithms are compared based on the M~3FD dataset.Through the comparison of fusion quality and target detection performance,the application prospect of each algorithm is analyzed to provide a technical basis for the public security departments. |