Object tracking is one of the most important tasks in computer vision,which attempts to detect and track targets in image sequences.Object tracking is suitable for traffic monitoring,robots,autodriving and other fields.This paper mainly studies the tracking problem in complex scenarios,and builds a new object tracking model architecture based on deep learning to overcome the shortcomings of existing object tracking algorithms.The main research work in this paper is as follows:(1)An object tracking algorithm based on Siamese network architecture is presented.Most existing trackers only learn one feature embedding to handle both classification and regression tasks simultaneously,making it difficult to optimize both classification and regression simultaneously.To solve this problem,this paper tries to deeply decouple classification and regression in the model structure,highlighting the independence of the two tasks in order to achieve the purpose of considering both the correctness of the foreground/background classification and the accuracy of the bounding box regression.In this paper,two feature extraction backbone networks are used to divide the model into two branches,which solves the problem that two types of tasks in the traditional Siamese network architecture are highly coupled in the feature extraction phase.(2)A double Siamese network architecture based on feature self-enhancement is proposed.Inspired by Transformer’s core idea,this paper builds a feature enhancement module based on multihead attention mechanism on the basis of a simple double Siamese network architecture,realizes selfenhancement of template and search region features,and establishes deep dependence between template and search region,which greatly improves the representation ability of double Siamese network model.(3)A double Siamese network architecture based on multi-branches information interaction is proposed.On the basis of double Siamese network architecture,this paper designs a multi-branches information interaction module based on cross-attention mechanism,establishes the information interaction channels between multiple branches,fully exploits the deep information dependence between multi-branches to obtain higher quality feature expression.The experimental results show that the network architecture designed in this paper greatly improves the accuracy of classification and regression,providing a new perspective for performance improvement of trackers. |