| Object tracking is one of the fundamental tasks in computer vision.Currently,Siamese trackers have become the most popular single object tracking model due to their proper balance between tracking performance and inference speed.However,several shortcomings can be found in the commonly used Siamese trackers,which are detrimental to accurate object tracking.Based on the deep network technology and to address the problems involved in Siamese trackers,two novel Siamese networks along with tracking algorithms are proposed in this work.The main contributions of the paper are as follows.To tackle the mismatching between multiple predictions in Siamese networks with multioutput branches,a Siamese tracker with feature reusing and two-stage refinement is proposed in this paper.The network is comprised of two components: a one-stage base tracker and a twostage complementary module.The base tracker with a standard Siamese network aims at providing preliminary predictions of target location and scale.The complementary module is interpreted as the core component in the proposed tracker,reusing the template features to assess the feature quality of potential regions.It leverages a matching estimation branch to align the multiple predictions,which prevents mismatching of multiple predictions in the first stage,and refine the target bounding box predicted previously with a proposal refinement branch for accurate and robust object tracking.Extensive ablation studies and performance comparison with other trackers on various benchmarks indicate that the proposed two-stage refinement tracker can improve tracking performances of the Siamese trackers effectively,and it shows comparable performance with the most current convolutional neural network-based trackers.To attack the issue that existing Siamese networks fail to take full advantage of appearance cues and motion cues of the targets,a Siamese tracker with diverse prior information reused is proposed in this paper.The proposed network consists of two novel components: channel-and space-aware feature enhancement and fusion of multi-dimensional cues.Spatial Attention,Foreground-Background Attention,and Channel Attention are incorporated into the former part to capture the target-related cues involved in object tracking and reinforce the target feature representation in the network.The latter component is devoted to fusing the multi-dimensional features mentioned above together,mining target motion cues embraced in adjacent video frames by fine-grained feature retrieval,and leveraging the motion cues to refine the target response map induced by the network during tracking,which enables stable and accurate tracking.Comprehensive experiments like visualization and ablation show that the Siamese tracker with diverse prior information reused poses a significant improvement in accuracy and reliability. |