Font Size: a A A

Research On Viusal Object Tracking Algorithm Based On Deep Siamese Network

Posted on:2023-05-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y J JiangFull Text:PDF
GTID:2568306794455244Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Accurate and robust visual object tracking is one of the most challenging fundamental tasks in the field of computer vision,which has a very wide range of applications in scenarios such as intelligent security,autonomous driving,and machine vision.Given only the initial target state,object tracking aims to achieve accurate estimation of subsequent target trajectories and states in the image sequence,which places high demands on object tracking algorithms.After years of research by scholars at home and abroad,great progress has been made in the object tracking methods.Especially in recent years,the algorithm based on Siamese network provides a new tracking paradigm,which has received extensive attention and greatly promoted the development of object tracking field.However,in the actual tracking process,the target may undergo scale change,deformation and occlusion,and there are also interference from complex factors such as illumination change,blur,and background clutter,which bring certain challenges to the tracking task.This dissertation conducts research on the deep Siamese network framework,analyzes its deficiency in feature representation and target state estimation,and proposes three new visual object tracking algorithms.The main contributions are as follows:(1)A Siamese network tracking algorithm with object boundary information is proposed to solve the problem that the anchor-based tracking algorithm Siam Mask is not accurate enough in localization and prediction when tracking through mask prediction.First,a multi-layer feature aggregation strategy is utilized to weight and fuse the multi-layer response maps to capture the details and semantic information at different levels and enhance the robustness of feature representation.Second,the boundary prediction branch is explicitly introduced based on the mask prediction,making use of the object boundary information to improve the localization accuracy.Last,the boundary feature fusion strategy and the boundary gaussian heatmap supervision are proposed to further promote the mutual learning of the mask and the boundary,so that the predicted mask is better aligned with the object boundary.(2)A target-cognizant anchor-free Siamese network tracking algorithm is proposed to solve the problem that the extraction of the template and search region features is disjoint for the anchor-free Siamese tracker.A target-aware attention block is first designed to refine the measurement of spatial similarity by computing the cross-spatial attention between the template and the search region,enabling the transmission of the relevant target appearance information before the correlation operation.Then two simple and effective accurate tracking mechanisms are introduced,which are jointly improved from two aspects of feature fusion and bounding box prediction,making them more suitable for the real tracking process.Finally,a max filtering module is proposed to utilize the potential localization ability of the regression branch to help distinguish similar distractors,improve the robustness of the algorithm,and bridge the gap between anchor-free and anchor-based tracking algorithms.(3)A dual-stream object tracking algorithm based on vision Transformer is proposed to solve the problem that the most Transformer based object tracking algorithms mainly ignore the ability of Transformer in feature extraction and decoding prediction.First,vision Transformer based on attention mechanism is used as the Siamese backbone network for feature extraction,making up for the problem of insufficient receptive field of the convolutional neural network,and obtaining a more robust feature representation.Second,the template and search region features are fully fused through the Transformer encoder-decoder structure,and targetspecific information is learned.Finally,the learned dual-stream information is predicted separately,and then the weighted fusion at the decision level is performed to realize the effective utilization and complementation of dual-stream information.The qualitative and quantitative experimental results conducted on a number of tracking datasets show that the proposed algorithms can enhance the tracking performance through more robust feature representaion and more accurate target state estimation,and achieve accurate object tracking in complex tracking scenarios.
Keywords/Search Tags:Object tracking, Siamese network, Feature representation, Attention mechanism, Transformer
PDF Full Text Request
Related items