Font Size: a A A

Research On Visual Object Tracking Algorithm Based On Deep Learning

Posted on:2023-12-27Degree:MasterType:Thesis
Country:ChinaCandidate:C YangFull Text:PDF
GTID:2568307082482604Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Visual target tracking is a common advanced visual task in computer vision.On the premise of manually selecting the first frame,it is required that the tracker can effectively track the target in real time.Traditional methods based on correlation filtering have many limitations in visual target tracking in real scenes,such as multiscale transformation,rotation,similar object interference,small target tracking and so on.In this paper,we implement visual target tracker based on convolutional neural network and Transformer respectively from the two problems of continuous target ROTation and similar object interference.Siamese networks are one of the most popular directions in the visual object tracking based on deep learning.In the convolution neural network-based tracker implemented in our paper,an effective Siamese network named Siam PBN is proposed to solve the rotation problem in tracking.We find that the predicted center of the object deviates from the center of the predicted box while the object rotates continuously.Therefore,we add a centerness head to the network to improve the performance of handling rotating objects.In addition,Siam PBN obtains the predicted outputs by point-based instead of anchor-based method,thus avoiding the complex computation associated with anchors.Our proposed method has gained the excellent performance on the datasets,including GOT-10 k,La SOT and VOT2019.The results of the evaluation on the VOT2019_ROT dataset showed that Siam PBN’s robustness compared to Siam BAN improved from 0.139 to 0.046,and the expect average overlap rate(EAO)increased from 0.432 to 0.514.In Siamese networks,the feature pyramid network(FPN)and the crosscorrelation completes feature fusion and the matching of features extracted from the template and search branch,respectively.However,object tracking should focus on the global and contextual dependencies.Hence,we introduce a novel transformer structure which contains a self-attention mechanism called encoder-decoder into our tracker as the part of neck.Under the encoder-decoder structure,the encoder promotes the interaction between the low-level features extracted from target and search branch by the CNNs to obtain global attention information,while the decoder replaces crosscorrelation to send global attention information into the head module.We add a spatial and channel attention component in the target branch,which can further improve the robustness of our proposed model for a low price.Finally,we detailly evaluate our tracker CTT,on GOT-10 k,VOT2019,OTB-100,La SOT,Nf S,UAV123 and Tracking Net benchmarks and our proposed method obtains competitive results with the state-of-the-art algorithms.
Keywords/Search Tags:Visual Object Tracking, CNN, Transformer, Attention mechanism, Siamese network
PDF Full Text Request
Related items