Object tracking is an important and challenging research direction in the field of computer vision.The rapid development of image and video processing technology based on deep learning has made object tracking more and more widely used in humancomputer interaction,intelligent transportation,video surveillance and other fields.Various excellent object tracking models have been proposed continuously to improve the accuracy and precision of target tracking.However,factors such as occlusion,deformation,scale change,and complex background still affect the robustness and accuracy of target tracking algorithms.In recent years,deep learning has achieved great success in computer vision fields such as target detection,image segmentation,and image classification,and has gradually been rapidly developed and applied in the field of object tracking.Deep learning technology can help solve the problems of low accuracy,poor performance and robustness of target feature information extracted by traditional filter-based object tracking algorithms.There are still many problems in the single object tracking algorithm based on deep learning.For example,the Siamese network tracking algorithm has the problem of insufficient target feature information in complex scenes,and the independent optimization of the classification and regression branches will also lead to the problem that high classification confidence is not the target.Since the Transformer tracking algorithm only uses the first frame as a learning template,it is easy to cause tracking failure as the update is accumulated.In addition,because the Transformer tracker only uses the last layer of the Res Net-50 network,it will also lead to the problem that feature detail information becomes less.In order to solve the problems of the Siamese network target tracking algorithm and the Transformer target tracking algorithm,this paper mainly studies the improved single target tracking algorithm based on deep learning,and uses the convolutional neural network to build an end-to-end target tracking framework.Based on this framework,this paper proposes two improved single-object tracking algorithms.The main research contents and contributions are as follows:1)The currently popular single-object tracking algorithm based on Siamese network has a weak discriminative ability,and it is difficult for the tracker to filter out distractors from the complex background.In this paper,we propose a Siamese Network Tracking Algorithm(Siam AR)based on a multi-scale attention mechanism and a relation detection module.The algorithm adds a multi-scale attention mechanism on the basis of the Siam RPN++ algorithm,and integrates spatial attention and channel attention to improve the ability of the model to learn feature information.It can selectively focus on useful features and suppress useless features.The algorithm can also obtain information of different scales from different receptive fields.The algorithm also adds a relationship detector module,which can filter out interference factors from complex backgrounds,thereby identifying targets in cluttered backgrounds.The algorithm is tested on five datasets including OTB-2015,and the results verify that the algorithm is significantly better than several current well-known tracking algorithms in terms of tracking accuracy and robustness.2)The existing Transformer-based target tracking algorithm only uses the features of the last layer of the Res Net-50 network,which cannot obtain enough detailed information and is difficult to accurately locate the target.In addition,if only a fixed linear update mechanism is used in the tracking process,it will easily lead to object tracking failure.This paper proposes a Transformer object tracking algorithm(AOTTracker)based on the attention feature fusion module and the online update module.Since high-level features have more semantic information,while low-level features have more detailed feature information and more accurate location information,the algorithm introduces an Attention Feature Fusion Module(AFFM),which can obtain rich details while acquiring semantic information、feature information and location information to improve feature expression ability.Using the Transformer architecture to build an online update module can better adapt to the appearance changes of the target,improve the generalization ability of the model,and solve the drift problem caused by tracking failure and update accumulation caused by only learning the first frame template.The algorithm was tested on five datasets including La SOT,and the experimental results verified that the algorithm can effectively improve tracking accuracy and robustness. |