| As a fundamental research topic in the field of computer vision,visual object tracking technology has a wide range of applications in intelligent security,video human-computer interaction and military reconnaissance.The single-object tracking studied in this project is to locate and predict a single object in the subsequent image sequence based on the information provided by the first frame.The deformation of the object itself during motion and the complex environment during the tracking process will cause great challenges,and in real applications,not only the tracking accuracy but also the real-time performance of the algorithm cannot be ignored.Therefore,designing a tracker that balances accuracy and speed has strong practical significance.In recent years,the fully-convolutional siamese networks for object tracking(SiamFC)has attracted widespread attention due to its end-to-end training method and ultrareal-time tracking speed.However,it is difficult to ensure accuracy when dealing with complex environments,leaving a lot of room for improvement.Lightweight networks can use fewer computational resources and memory space while maintaining model accuracy,making them more suitable for real-time tasks compared to traditional deep networks.Therefore,this paper will introduce a lightweight network based on SiamFC and improve the algorithm from different perspectives.The main research contents are as follows:(1)In response to the current problem of weak feature extraction ability of tracking algorithms,a tracking algorithm based on lightweight siamese network with enhanced features is proposed.Firstly,an improved Shuffle Net V2 is used instead of Alex Net as the feature extractor,fully utilizing the advantages of deep networks,greatly improving the tracking speed while reducing the model’s parameter and computational complexity.Secondly,a channel and spatial attention mechanism is proposed to enlarge the response value gap between different channels and spaces,learning more beneficial features.Then,a hierarchical feature fusion method is proposed,fully utilizing deep semantic features and shallow spatial information.Experimental results on two datasets OTB100 and VOT2018,compared to the baseline algorithm,the precision has improved by 8.3% while meeting real-time operation,and the model size has been reduced to 1/6 of its original size.The proposed algorithm also shows strong robustness in difficult scenarios.(2)A tracking algorithm based on negative example mining and feature fusion is proposed to address the problem that the SiamFC algorithm has poor feature discrimination due to the imbalance of positive and negative example distribution during offline training.First,a lightweight backbone network is constructed using Shuffle Net V2 for feature extraction.Secondly,a negative example mining strategy is proposed to counteract the impact of data distribution imbalance by constructing negative sample pairs from the same and different categories,improving the model’s feature discrimination.Then,a multi-scale feature fusion strategy is used to fully exploit the advantages of deep and shallow networks.Similarity response maps from different layers are fused to represent the target from multiple perspectives,further improving tracking performance.Finally,experiments on the OTB100 and VOT2018 datasets demonstrate that the proposed algorithm improved the precision by 8.3% compared to the baseline algorithm,and the parameter count of the network has been reduced to 1/5 of its original size,while the tracking speed is far beyond real-time.(3)A tracking algorithm based on response map confidence for template update is proposed to address the problem of poor matching due to target deformation or rotation in siamese network tracking methods that always match the initial template and the search region of the first frame.First,a template branch is added to store short-term template,improving online tracking.Second,two confidence metrics are developed based on the similarity response maps,selecting high-confidence search image blocks as the latest short-term template for adaptive template updates.Then,the intersection over union and peak value difference are calculated separately for the prediction regions using short-term and initial templates,and the result that best fits the current target is selected to achieve more accurate target localization.Finally,detailed experiments are conducted on the OTB100 and VOT2018 datasets,demonstrating that the proposed algorithm can further improve the performance of the baseline algorithm achieving a precision improvement of 8.8% and more robust tracking. |