Font Size: a A A

Research On Siamese Network Based Visual Object Tracking

Posted on:2024-07-27Degree:MasterType:Thesis
Country:ChinaCandidate:W XiaoFull Text:PDF
GTID:2568307106499384Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the continuous breakthroughs in deep learning technology and the constant improvement of computer software and hardware,computer vision have made great strides in development.As an important branch of computer vision,visual object tracking has been a hot topic of research.Object tracking is a core task in the field of computer vision.After obtaining the position and scale of the target from the first frame of the video,the tracker needs to predict the state of the target in each subsequent frame to obtain the motion trajectory of the target throughout the entire video sequence.Recently,visual object tracking has obtained widespread attention due to its expansive applications,such as intelligent transportation,human-machine interaction and intelligent surveillance,and has achieved considerable development.However,adaptive target tracking is still a big challenge in real-world complex scenarios due to the occlusion,background clutters,scale variation,and fast motion.To solve the tracking drift problem caused by background noise interference in complex real-world scenes,as well as the problem of limited flexibility in target scale estimation caused by inefficient scale estimation modules,based on the Siamese network-based target tracking algorithm,this thesis insvestigates and imporves the tracking algorithm from the perspectives of feature extraction,result fine-tuning,feature fusionand target prediction,respectively.To address the problem of tracking drift caused by the incompleteness of feature usage and the inefficiency of matching algorithms,this thesis proposes a Siamese network-based object tracking algorithm based on feature fusion and fine-grained matching from the perspective of feature extraction and feature fusion.This algorithm uses a spatial feature fusion module to enhance the sensitivity of the network to the spatial location of the target by applying different weights to different positions of the multi-layer features and fusing the detailed information such as edge contours in the lower layers with the semantic information in the higher layer features through feature fusion to improve the network’s perception of target foreground and background information.In addition,to address the problem of the traditional cross correlation matching algorithm having a large single matching area that causes background clutter to affect the matching,we propose a fine-grained matching algorithm that can match the target in the search area by pixel-by-pixel matching while ensuring low loss of target template information.This reduces the effect of background clutters on matching and enhances the model’s perception of the target.To verify the effectiveness of the proposed algorithm,extensive experiments are carried out on the public datasets OTB50,OTB100,and VOT2019.Compared the algorithm with classical tracking algorithms on 11 tracking challenges including target occlusion,background clutter,scale variation,and fast motion.The experiments show that our method can significantly improve the tracking performance in complex real-world tracking scenarios.Although the above tracking algorithm based on feature fusion and fine-grained matching can improve the accuracy of target prediction,the scale estimation algorithm still uses a region proposal network.The region proposal network not only introduces a large number of hyperparameters,but its adaptability to target scale changes also has a certain upper limit,resulting in inaccurate detection of the target scale by the tracker.To solve this problem,this thesis proposes a Siamese network tracking algorithm based on result refinement and fine-grained scale regression from the perspective of target prediction and result fine-tuning.Firstly,to enhance the ability of the tracker in detecting the target,we propose a pixel-based sample partition method in the training process,By partitioning samples pixel by pixel,we avoid the interference caused by background during training due to rough sample partition,and make the pixel-wise classification and regression possible.On this basis,the traditional region proposal network that obtains the target position and scale through preset anchors and candidate boxes is abandoned,and a new pixel-by-pixel classification regression method is applied.The pixel-by-pixel classification regression method classifies on a pixel-by-pixel basis and uses the classification center point as the core to calculate the vertical distance from this center point to the predicted bounding box through regression,thereby indirectly obtaining the boundary scale of the current pixel point.While greatly reducing the introduction of prior hyperparameters,it significantly improves the flexibility of scale prediction.In addition,to address the occasional bounding box drift problem caused by inaccurate bounding box prediction in complex real-world scenarios,the algorithm uses a plug-and-play module to fine-tune the tracking results.This module produces more accurate results after fine-tuning by resampling and detecting on the basis of having determined the approximate position of the target,and incorporate our fine-grained matching algorithm to enhance the spatial awareness of the target during the prediction process.To verify the effectiveness of the algorithm,we conducted extensive experiments on on the public datasets OTB50,OTB100,and VOT2019.We compared our algorithm with other classical tracking algorithms on 11 tracking challenges in the OTB100 dataset.The comparison results and example visualization analysis show that our method can significantly improve the tracker’s ability to track targets in complex real-world scenarios.In summary,this thesis makes several improvements to the tracking algorithm based on the previous Siamese network object tracking algorithm.First,we apply spatial feature fusion and fine-grained matching to improve the completeness of feature application in the tracking network and improve the accuracy of target detection by the matching algorithm.Then,on this basis,to address the issue of accuracy in target prediction,we use pixel-level classification regression and result fine-tuning to enhance the flexibility of the tracker to target scale changes and reduce the occurrence of tracking drift.Finally,through a large number of experiments on public datasets,this thesis verifies the effectiveness of the improved algorithm.
Keywords/Search Tags:Siamese neural network, visual object tracking, cross correlation, feature fusion, anchor-free network
PDF Full Text Request
Related items