Font Size: a A A

Research On Fusion And Target Recognition Of Infrared And Visible Light Images Based On Deep Learning

Posted on:2024-05-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y H ZhangFull Text:PDF
GTID:2558307181451274Subject:Instrument Science and Technology
Abstract/Summary:PDF Full Text Request
Image-based object recognition is an important component of autonomous driving systems and intelligent traffic management systems.The object recognition subsystem needs to provide accurate traffic object information in real-time to support subsequent decision-making or other subsystems.However,traffic object recognition in complex environments is challenging,as targets have varying scales and shapes.Additionally,imaging devices have difficulty capturing sufficient images to provide target information in poorly lit visible light imaging environments,leading to recognition performance that does not meet practical application needs.However,infrared and visible light image information is complementary,and utilizing both modalities’ information can improve the system’s all-weather workability.Therefore,this paper focuses on infrared and visible light fusion and object recognition based on deep learning,with high real-time performance,strong robustness,and high accuracy as indicators.A multi-stage cross-modal feature fusion object recognition network based on YOLOv5 s is proposed.In addition,this paper improves the recognition accuracy of small targets based on this fusion recognition network.Therefore,the main research contents of this paper are as follows:(1)Design an end-to-end network for infrared and visible light feature extraction,feature fusion,and object recognition.Based on the YOLOv5 s network,the CSPDark Net53 network is extended to a dual-stream network to extract infrared and visible light image features separately.According to the characteristics of infrared and visible light feature maps at different stages,the feature maps are fused at three downsampling stages of 8 times,16 times,and 32 times to achieve target localization and classification using the detection network of YOLOv5 s to detect the fused feature maps.(2)This paper constructs two feature fusion modules,VIAF module based on feature map addition and VICF module based on channel concatenation,for the characteristics of infrared feature maps and visible light feature maps in different feature extraction stages.The VIAF module highlights the common information of infrared and visible light modalities by allocating weights through attention mechanisms and superimposing different modal feature maps.The VICF module strengthens the differential information of the infrared and visible light modalities that are more favorable for recognition by using channel attention mechanisms after concatenating different modal feature maps.The VIAF module is used to fuse 8 times downsampled feature maps,while the VICF module is used to fuse 16 times and32 times downsampled feature maps,thereby constructing an end-to-end network to complete the three sub-tasks of feature extraction,feature fusion,and target recognition.Experimental results show that this cross-modal feature fusion recognition network has achieved accuracy improvement.(3)To improve the deficiency of the cross-modal feature fusion recognition network in small-scale target recognition,the paper adds a small-scale detection head,uses the FPN and PAN structures to fuse the lower-level 4 times downsampled feature map,and uses the feature lateral connection from the feature extraction network to the detection end to enhance the feature that is conducive to small target recognition.At the same time,the loss function is optimized to reduce the failure of the small target position loss function.In addition,a data augmentation strategy for simulating small target scenes is proposed to enrich the number of small targets in the dataset and improve the small target recognition rate during training.The experimental results show that the improvement strategy helps to improve the network’s ability to recognize small targets.
Keywords/Search Tags:Deep Learning, Target Recognition, Cross-modal feature fusion, Attention Mechanism
PDF Full Text Request
Related items