| RGBT target tracking aims to effectively fuse visible and thermal infrared video sequences together.There are two main reasons why RGBT target tracking can achieve all-weather efficient monitoring.On the one hand,RGBT target tracking can effectively use thermal infrared information to provide strong information compensation for the apparent characteristics of visible light targets under poor lighting conditions.On the other hand,visible light information can assist in solving the thermal cross problem faced in thermal infrared image processing.Although there are many research results in related fields,they still face challenges such as dynamic occlusion and similar background.With the rapid development of artificial intelligence and big data processing technology,deep learning has developed in the direction of cross-task and multi-modal.This development trend can provide scene information for solving the inherent difficulties in RGBT target tracking.In order to effectively utilize multi-task scene information,this paper deeply studies RGBT object tracking method for complex scenes from three levels: feature,update strategy and attention.The research innovations of this paper are as follows:(1)Feature level: Aiming at the limitation that traditional RGBT target tracking methods only focus on the feature representation learning of the target to be tracked,a target tracking method based on scene consistency is proposed.The design intention of the proposed method is to find that strengthening the consistency of global reasoning of different modalities is helpful to improve the robustness of target features and solve the problem of incomplete modal information caused by complex backgrounds.Based on this,under the framework of multi-task learning,the proposed method,A nested global inference model is designed to regulate the consistency of scene perception in different image domains(reasoning about the relationship between the target and the surrounding semantic regions).Through a large number of experiments on three datasets,it can be verified that the designed RGBT target tracking method can better deal with the occlusion challenge of the target in complex scenes and significantly improve the robustness of target tracking.(2)Attention level: Aiming at the problem that traditional target tracking based on convolutional neural network cannot effectively use context information to fuse multi-modal attention,an RGBT target tracking method based on double-branch cross self-attention mechanism is proposed.The proposed model introduces a dual-branch Transformer module to encode infrared and visible light targets respectively,and on the basis of dual-branch parallel coding,the cross-attention mechanism is used to align the attention of the two branches,thereby improving the accuracy and stability of target tracking.A large number of experiments on three data sets verify the superiority of the proposed method.(3)Update strategy level: In the process of target tracking,the online update method directly determines the robustness of long-term tracking.Considering the importance of the update strategy,this paper proposes a cross-modal feature update method.Specifically,we first introduce a metalearning mechanism to update the nested global inference model and the two-branch Transformer module using scarce online samples.Secondly,unsupervised learning is added to meta learning.The added unsupervised learning refers to the mask-based data augmentation method,and generates modal mixed negative samples in the unsupervised triplet loss,so as to facilitate the tracking model to use two modal information for unsupervised training at the same time and strengthen the discrimination of salient objects. |