Model Learning,Compression,and Integration For Robust Visual Object Tracking

Posted on:2022-01-23

Degree:Doctor

Type:Dissertation

Country:China

Candidate:N Wang

Full Text:PDF

GTID:1488306323462874

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

Visual object tracking is a fundamental task in computer vision.Given the initial target state,a visual tracker requires to locate the target object in successive frames.In recent years,deep learning based visual tracking has made significant progress.Nev-ertheless,deep learning techniques also bring the issues such as expensive annotation cost,high model complexity,and limited tracking efficiency.To release the potential of deep visual tracking,this thesis focuses on three aspects including model learning,model compression,and model integration.The main contributions of this thesis are three-fold:In model learning,this thesis investigates how to alleviate the training cost of deep trackers and how to leverage the temporal information resides in the tracking videos.First,to alleviate the expensive data labeling and high training cost in deep visual track-ing,this thesis presents an unsupervised tracking framework based on the forward and backward tracking trajectory analysis.The motivation of unsupervised learning is that a robust tracker should be effective in bidirectional tracking.In the training process,the proposed algorithm measures the consistency between forward and backward trajecto-ries to learn a robust tracker from scratch merely using unlabeled videos.The proposed unsupervised tracker exhibits the baseline accuracy of classic fully supervised trackers while achieving a real-time speed.Next,to exploit the rich temporal information in the video flow,this thesis introduces the transformer architecture to the visual track-ing community.The transformer encoder-decoder structure tightly bridges the isolated video frames to propagate rich temporal cues(e.g.,target features and attention masks)across frames.By virtue of the proposed transformer,existing trackers gain substantial performance improvements and achieve state-of-the-art accuracy.In model compression,to tackle the high computational complexity,huge model parameters,and unsatisfactory tracking efficiency of deep tracking,this thesis proposes to jointly compress and transfer the heavyweight tracking models.This work formulates a CNN model pretrained from the image classification task as a teacher network,and distills this teacher network into a lightweight student network as the feature extractor to speed up correlation filter trackers.In the distillation process,this thesis proposes a fidelity loss to enable the student network to maintain the representation capability of the teacher network,and designs a tracking loss to adapt the objective of the student network from object recognition to visual tracking.Extensive experiments on standard datasets demonstrate that the lightweight student network accelerates the speed of state-of-the-art deep trackers to real-time on a single-core CPU while maintaining almost the same tracking accuracy.In model integration,this thesis investigates how to assemble multiple deep track-ers to achieve the model complementation.This thesis proposes two ensemble algo-rithms including a multi-cue analysis strategy and a policy-based switch framework.Multi-cue analysis framework constructs multiple experts through correlation filter and each of them tracks the target independently.With the proposed robustness evaluation strategy,the suitable expert is selected for tracking in each frame.Furthermore,the di-vergence of multiple experts reveals the reliability of the current tracking,which is quan-tified to update the experts adaptively to keep them from corruption.With the proposed multi-cue analysis,tracking performance is significantly improved.The policy-based switch framework aims to retain the performance advantage of the ensemble frame-work without sacrificing tracking efficiency.This algorithm consists of multiple weak but complementary experts and an agent network.By formulating this expert switch in consecutive frames as a decision-making problem,this approach learns an agent via re-inforcement learning to directly decide which expert to handle the current frame without running others,which greatly ensures the overall tracking efficiency.Extensive exper-iments verify the effectiveness of the proposed method.

Keywords/Search Tags:

Visual object tracking, Unsupervised learning, Model compression, Ensemble learning, Correlation filter, Siamese network

PDF Full Text Request

Related items

1	Research On Correlation Filter And Siamese Network Hybrid Algorithm For Visual Object Tracking
2	Object Tracking Based On Correlation Filter And Deep Model Compression
3	Visual Target Tracking Based On Optimized Ensemble Learning And Spatial Correlation Filter
4	Algorithm Study On Object Tracking Via Language And Visual Model
5	Regularization Methods For Non-Specific Visual Object Tracking
6	Research On Correlation Filter Based Visual Object Tracking With Deep Image Representations
7	Research On Target Tracking Based On Siamese Network And Correlation Filter
8	Research On Real-time And Robust Object Tracking Based On Correlation Filter And Siamese Network
9	Research On Visual Object Tracking Algorithm Based On Deep Learning
10	A Research Of Visual Tracking Algorithm Based On Deep Learning