| Integrated industrial and defense applications require unmanned systems need to more accurate to guarantee high accuracy and all-weather applications,which can simultaneously detect and track targets from complex situations.In recent years,visual target tracking technology has been widely used in robotics development,the autonomous vehicle industry,human-machine interface devices and video surveillance equipment.Depending on the input information source,the tracking methods can be basically divided into visible image target tracking,infrared image target tracking,and visible and infrared fusion target tracking.Currently,the most common is visible target tracking,visible images are susceptible to degradation of tracking performance due to complex environments.In this paper,we address the challenges in unknown environments to improve tracking stability,accuracy and real-time,and carry out corresponding research from visible and infrared image fusion and object tracking recognition.The main research content is as follows:(1)The study of multi-modality feature extraction methods: RGB and IR data contain different feature of the same scene,and their complementary advantages overcome the limitations of single source imaging.How to effectively utilize their complementary advantages becomes a crucial issue in improving tracking and detection accuracy.and Consider the richness of information encompassed between them is different,so this paper proposes a multi-level extraction network for extracting feature information of different depths from different modalities.(2)A multi-modality feature complementary mechanism based on attention mechanism is proposed according to the characteristics of infrared features and visible light.IR images are more stable due to the characteristics of IR imaging devices which are not disturbed by light,smoke and haze.In this paper,shallow infrared features are used as contextual information to emphasize the attention on the shared feature part of RGB to take advantage of the stable nature of infrared images and the richer feature of RGB images.The RGB features emphasized by contextual information are fused with the IR features to take advantage of multi-modality shared features as well as multi-modal specific features to achieve full utilization of multi-modality complementary advantages.(3)A visible video-based re-identification module is proposed for the tracking drift problem which is caused by the tracking mechanism.Inspired by the full convolutional Siamese network and correlation filtering algorithm,this paper designs a region reidentification module.In the case of tracking drift caused by complex environment,this network can re-predict the target center and achieve the purpose of improving tracking accuracy. |