Font Size: a A A

When Correlation Filters Meet Convolutional Neural Networks For Visual Tracking

Posted on:2019-08-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:D D LiFull Text:PDF
GTID:1368330611493009Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Visual tracking is a fundamental yet rapidly evolving research area with numerous applications including video surveillance,automatic driving,robotics and augmented reality.It provides a fundamental component for high-level video understanding applications such as motion analysis,event detection and activity recognition.In general,the task of visual tracking can be described as estimating the spatial trajectory of a target object in an image sequence,given its initial state,i.e.location and underlying area.Despite significant process in recent years,robust and efficient model-free tracking is still one of the most challenging problems in computer vision.In recent years,Discriinative Correlation Filters(DCF)and Convolutional Neural Networks(CNN)based trackers have achieved enoumous popularity in the tracking community.By utilizing properties of the circular structure,DCF transform sliding-window correlation in the spatial domian into element-wise operation in the frequency domain and achieve beyond real-time tracking frame-rates.On contrast,Convolutional Neural Networks(CNN)are able to benefit from end-to-end training and contribute to high tracking accuracy and robustness in complex scenarios.Considering the complementrary merits of DCF and CNN,we propose to integrate DCF and CNN into a unified framework to achieve robust and real-time visual tracking.In this dissertation,the primariy contributions and innovations are as follows:1.A spatially weighted correlation filter based tracker is proposed.In order to overcome boundary effects and the rectangular shape assumption,this dissertation introduces target likelihood into the standard DCF formulation to discriminatively adjust the contribution of each filter value to circular correlation.An iterative optimization procedure is designed based on the Preconditioned Conjugate Gradient(PCG)method for filter training.Experiments on the public benchmarks demonstrate that our approach with shallow convolutional features achieves state-of-the-art performance with improved frame-rates on Graphics Processsing Units(GPU).2.Lightweight Convolution Operators(LCO)are proposed for fast tracking.To reduce model complexity and boundary effects for correlation filters based trackers,this dissertation introduces spatial constraint and a feature projection matrix into the standard DCF formulation.With spatial constraint,we can remove correlation filter values in the background area and guarantee a small filte size.With the feature projection matrix,we can perform feature dimensionality reduction to guarantee fewer feature dimensions.The correlation filter and feature projection matrix can be jointly solved with the PCG algorithm.Experiments on the public benchmarks demonstrate that our approach can achieve state-of-the-art performance and reduce over 90% of the redundant trainable parameters in the tracking model.2.A Real-time Complementary Tracker(RCT)is proposed by combining a DCF based tracker and a Siamese network based tracker(Siamfc)into a two-stage tracking framework.In this framework,DCF and Siamfc share complementary advantages and make up with each other.RCT first locates the target coarsely with Siamfc and then accurately with DCF.An automatic activating mechanism for the Siamese network based tracker is desinged to achieve a real-time tracking speed without GPU.Siamfc is activated occasionally based on the tracking status inferred from the correlation response map of DCF.Experimental results on public benchmarks demonstrate that RCT achieves stateof-the-art tracking performance while running in real-time without GPU.3.A Coarse-To-Fine Tracking(CTFT)framework is proposed to track the target with hierarchical convolutional features.This framework breaks the task of visual tracking into two stages.In the first stage,CTFT locates the target coarsely with a deep convolution operator in a large search area.In the second stage,CTFT refines this coarse location using a correlation filter with shallow convolutional features.With this two-stage tracking framework,CTFT holds a large target searach area and maintains the efficient elementwise solution of standard DCF.Experimental results on public benchmarks demonstrate that CTFT achieves an average tracking speed of 35.8 fps on GPU and state-of-the-art tracking performance.4.An end-to-end feature learning framework for CNN based trackers is proposed to tackle feature calibration and foreground-background data imbalance.A lightweight squeeze-and-excitation block is coupled to each convolutional layer to generate channel attention for each feature channel.Channel attention reflects the channel-wise importance of each feature channel and is used for feature weighting in online tracking.Focal loss is introduced into the loss layer to tackle the foreground-background data imbalance in network training.The proposed focal logistic loss down-weights the loss assigned to easy examples in the background area.Both the SE block and focal logistic loss are computationally lightweight and impose only a slight increase in model complexity.Experimental results demonstrate that the enhanced tracker achieves significant performance improvement while running at a real-time frame-rate of 66 fps.
Keywords/Search Tags:Visual Tracking, Discriminative Correlation Filter(DCF), Convolutional Neural Network(CNN), Siamese Network, Squeeze-and-Excitation Network, Focal Loss
PDF Full Text Request
Related items