| Single object tracking is one of the key research directions of computer vision.Specifically,it refers to providing the information of the target to be tracked in the first frame of the video or image,and then predicting the location of the target in the subsequent frame of the video or image according to the algorithm.Single object tracking has important applications in unmanned aerial vehicles,military guidance,video surveillance,human-computer interaction and other fields,so it has attracted much attention.Since the single object tracking was proposed,a large number of research results have been achieved,but under complex conditions,researchers still need to continue to improve the network to achieve real-time and accurate target tracking.An excellent single object tracking network should be able to accurately track the object while ensuring real-time performance.Therefore,in view of the above problems,this thesis will improve from the following aspects:(1)To obtain target features faster and better in feature extraction network,this thesis introduces a new paradigm combining convolution and Self-Attention in feature extraction,which considers both advantages and has less overhead,improves the computing speed of feature extraction network and continuously improves downstream tasks.(2)To better capture the correlation between the target and the search area,this thesis uses the feature fusion mechanism to better adapt to the new paradigm used in the feature extraction part,achieving a simple and efficient effect of capturing the correlation between the target and the search area.Among them,the feature fusion mechanism is composed of an optimized encoder decoder,which we call TED(Transformer encoder-decoder).Through TED,the performance of target tracking network is improved.(3)To adapt to different application scenarios of tracking network,two target prediction methods are adopted.One is a simple method,which adopts a simple regional recommendation network.The other is to meet the high-performance requirements by adding a center point branch to make the bounding box result more accurate.Finally,our method was tested on several large public data sets,such as La SOT,TLP,Tracking Net,and UAV123,and on VOT2020 and OTB100 small public data,and advanced performance was achieved. |