The long-term target tracking algorithm is affected by various complex factors such as the disappearance and reappearance of the target,the drastic changes of the target scale and appearance.The performance of the algorithm is far from the actual application.With the development of science and related algorithm theories,researchers use different methods to obtain advanced and abstract deep features which improve the accuracy of tracking algorithms.Aiming at the problems of disappearance and reappearance of the target,the drastic changes of the target scale and appearance,and the problem of error accumulation in target tracking methods based on the assumption of temporal-spatio consistency,we use deep learning methods to extract the deep semantic information of the tracked targets.By encoding and decoding the shallow and deep features of the target,our method can output the accurate position of target,and the accuracy and robustness of the long-term target tracking model are improved.The main contents of this paper are as follows:(1)Multi-scale Global Retrieval and Temporal-spatio Consistency Matching based Longterm Target Tracking Model is proposed.In this paper,we redefine the long-term object tracking task as global retrieval task,which means that Global retrieval is performed based on a fixed examlar image on the search image.This ensures that the inference results of each frame are independent of each other.Based on this method,the problem of error accumulation in the target tracking method based on the assumption of temporal-spatio consistency can be solved,and the problem of disappearance and reappearance of the target can be overcome.The atrous convolution is used to obtain the deep semantic information of the target under different receptive fields,so as to solve the problem of tracking failure caused by the change of target scale and target appearance.Under the premise of no error accumulation,temporal-spatio consistency matching model is introduced to fully improve the tracking accuracy.The experiments and ablation experiments on VOT2018 LT and Ox Uv A respectively demonstrate the effectiveness of the model proposed in this paper and the effectiveness of the Temporalspatio Consistent matching model.(2)On the basis of the above model,to further explore the performance of the model,including accuracy improvement,inference speed improvement and robustness improvement,we propose an improved tracking model based on attention and new online update mechanism.Aiming at the limited real-time performance of tracking model,we use depthwise separable convolution to reduce the amount of computation and improve the inference speed of the tracking model.Aiming at the limited accuracy of tracking model,a lightweight attention model Non-Local is introduced to enhance the self-encoding and cross-encoding capabilities of feature maps.While improving the accuracy of the algorithm,the inference speed of the model is guaranteed.In order to ensure the robustness of the model in the long-term inference state,an online update mechanism is introduced.The online update mechanism can effectively enable the model to learn the deep information of the target in different scenes.The effectiveness of the improved model is proved by performance analysis experiments on VOT2018 LT,Ox Uv A and GOT-10 K. |