Font Size: a A A

Research On Correlation Filter Based Visual Object Tracking With Deep Image Representations

Posted on:2019-08-21Degree:MasterType:Thesis
Country:ChinaCandidate:M L J LiFull Text:PDF
GTID:2428330572952096Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Video has been playing a more and more important role in todays information society.The information carried by video is far richer than that in text or image.The amount of videos cre-ated by people in their lives is growing at an amazing speed.With such large-scale data,video websites are hoping to gather useful information from it,automatically.Visual object track-ing is the key component of this technology.Over the past decades,many researchers have examined vast machine learning methods in order to solve this problem and progresses were achieved.However,when coping with videos took at natural scene,trackers still suffer from appearance changes of target such as motion blur,low resolution and illumination change.To overcome these challenging conditions,this paper proposes two noval tracking methods based on the recently popular correlation filter tracker.Correlation filters can utilize large number of samples efficiently which is very suitable for real-time applications.However,bounding box predicted by simple correlation tracker is unsatisfied and the corresponding model is very likely to drift.This paper focuses on these shortcomings of correlation filter tracker and obtains the following achievements.When target undergoes large appearance changes,it is hard for long-term model to react.The outputs may be bad.This paper proposes a dual model framework that simultaneously uses long-term model and short-term model.The short-term model is learnt using only the most recent frames-frames that close to the test frame.Target usually appears highly similar in these frames.Classifier can easily distinguish it from background or distractors.Long-term model remembers historical frames and thus is more reliable for classification.Short-term model does well in details.By combining both,we obtain a model that tracks more stable and accurate.For feature extraction,this paper uses ouputs of multiple convolution layers and construct a hierarchical deep representation.Deeper layers capture more semantic in-formation while earlier layers have higher resolutions.The feature,together with the shift invariance of convolutional network,can lighten noises introduced by target change.This approach obtains 86.6%mean distance precision and 68.0%mean overlap precision on the OTB2015 dataset which is superior to all compared state-of-the-art trackers.Although it is convenient to embed a pretrained feature network into tracking framework,optimizing two parts separately is not guaranteed to find the global minimal in solution space.This paper proposes an end-to-end method that uses a single network for both feature extraction and online tracking.For feature extraction,this paper applies same transformation to examplar and search region with a siamese network.Cross correlation is replaced by a linear layer through which gradients are passed to convolution layers.Results predicted by the network are often not accurate enough,this will introduce extra noises in sampling.This paper trains a bounding box regressor online using the first frames.Outputs of later frames are improved once.Experiments show that boxes predicted by this approach are more accurate,achieving 71.4%mean overlap precision on OTB2015.Meanwhile,the speed of end-to-end tracking network is also fast.
Keywords/Search Tags:Visual Object Tracking, Correlation Filter, Deep Learning, Model Ensemble, End-to-end Tracking
PDF Full Text Request
Related items