Font Size: a A A

Studies On One-shot Based Deep Visual Tracking

Posted on:2019-04-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y J YaoFull Text:PDF
GTID:2428330566498105Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Visual tracking is one of the most challenging task in computer vision.In recent years,with the rapid development of deep learning,more attentions have been paid to deep oneshot based trackers.Deep one-shot means based on training on large amounts of data and tracking without online adoptation,which results in real-time tracking speed.But almost all of the one-shot based deep trackers only apply the last output of deep convolutional network as the feature representation,which contains more semantic details but the low resolution can not meet the requirement of accurate localization during tracking.At the same time,there are much easy samples in the the training progress.Even though the corresponding loss is small enough to be ignored,much such easy smaples may have a great effect on the total loss and can affect the training and tracking performance.Take above issues into account,we first propose two kind of feature fusion methods motivated by the human pathway,the first one is to add the response maps from different layers with different weights while the another one is a top-down modulation which further considers the realtion among different layers.In order to handle the imbalance between the number of easy and hard samples,We further propose the online hard negative mining and hinge loss based method to handle the imbalance between the easy and hard samples.The experiments on several famous datasets demonstrates the effectiveness of the proposed method but is still far away from the state-of-the-art methods.Deep one-shot based trackers can't predict the appearance and background variations in the following frames because of the lack of series information.Thus we further design a one-shot based manual annotation experiment and find that even human has strong learning ability but still can't deal with some distractors such as appearance variation and motion blur.And we find that the annotated results are far away from the state-of-theart trackers which motivates us to take information between frames into consideration.A framework for joint learning of deep represenation and truncated inferenece(RTINet)in visual tracking is proposed,which sheds some light on incorporation the advances in deep representation learning and CF modeling for improving the performance.The proposed RTINet framework achieves favorable tracking accuracy against the state-ofthe-art trackers and its rapid version can run at a real-time speed of 24 fps.
Keywords/Search Tags:Visual Tracking, One-shot Learning, Deep Learning, Correlation Filter
PDF Full Text Request
Related items