Font Size: a A A

Visual Object Tracking Algorithm Based On Fully Convolutional Siamese Network

Posted on:2022-02-08Degree:MasterType:Thesis
Country:ChinaCandidate:C M HuFull Text:PDF
GTID:2518306731987949Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Visual object tracking is one of the fundamental tasks in the field of computer vision.Its general setting is: the size and position of the target of interest are given in the first frame of the video,and the size and position of the target in subsequent frames are required to be evaluated.Since there are no restrictions on the target and its environment,the development of object tracking algorithms is full of challenges.In recent years,deep learning technology has provided new ideas for the research of object tracking algorithms.The tracking method based on the Siamese network is favored by researchers.However,the original fully convolutional Siamese network tracker has several shortcomings: First,the template is only cropped from the first frame of video,and it is difficult to express the appearance of the target;second,the method of the target size evaluation is too simple and affects tracking accuracy.Aiming at these two shortcomings,this thesis proposes the following two improved models:(1)Multiple templates are maintained by Siam MT during the operation of the algorithm to represent the appearance of the target.The selection module is designed to select the best template from the template set to track the target in the current frame,and the update module is designed to determine whether to update the template set with the current template.The selection module utilizes the reliability score to screen reliable templates,and employs the matching score to select the best template;the update module uses the joint Io U to determine the diversity of the current template and replaces the templates in the template set according to the cumulative tracking loss.Experiments on OTB2015,VOT2016,and VOT2017 have verified the effectiveness of the proposed algorithm.(2)This thesis proposes a target scale estimation network(Siam TE).Siam TE adds classification branch and regression branch at the end of the fully convolutional Siamese network.The output response maps of the two branches have the same size,the classification response map has only one channel,and the regression response map has four channels.The classification branch classifies the target and background of the search frame,and the regression branch evaluates the position and size of the target.In order to get better regression results,this thesis proposes a new Io U loss function to train the regression branch based on the original Io U loss.The experiments on OTB2015,VOT2018 and VOT2019 verify the robustness of the Siam TE model,and the comparative experiments on OTB2015 prove the effectiveness of the proposed Io U loss function.The algorithms proposed in this thesis enhance the performance of the fully convolutional Siamese network in terms of target appearance representation and scale estimation.Compared with state-of-the-art trackers,Siam MT and Siam TE have achieved competitive results on the public datasets OTB and VOT.Among them,Siam TE can run at a speed of 180 frames per second to meet real-time requirements.The performance improvement makes the target tracking algorithm proposed in this thesis have broader application prospects in intelligent driving,video surveillance and other fields.
Keywords/Search Tags:Object tracking, Siamese network, Classification, Regression, Loss function
PDF Full Text Request
Related items