Font Size: a A A

Research On Model Guided And Feature Enhancement Methods For Deep Visual Tracking

Posted on:2020-04-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:X H WuFull Text:PDF
GTID:1368330590972978Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Visual Object Tracking is one of the basic computer vision tasks,and it is a challeng-ing problem due to large changes of object appearance caused by occlusion,deformation,distractors,low resolution as well as background clutter In recent years,the correla-tion filter based(CF-based)and convolutional neural network based(CNN-based)track-ing methods significantly improve the tracking performance.Particularly,the CF-based models accelerate the efficiency and the CNN-based ones promote the tracking accuracy.However,the existing explorations of their combination are limited.Among the CF-based trackers,the deep features extracted from the pre-trained CNNs are directly adopted,or with simple feature selection and feature fusion strategies,which limits the feature repre-sentation.On the other hand,the simple CF is integrated into the CNN as a differentiable CNN layer.However,it fails to exploit the continuous improvement on CF models and can not achieve comparable tracking accuracy with the deployment of advanced CF mod-els on deep features pre-trained for classification and detection tasks.To achieve the goal of high efficiency and accuracy,robust tracking model needs further study.The process of tracking generally consists of motion model,feature extraction,ob-servation model and model update.Especially,the feature extraction and observation model are two key parts.The variety of target objects' appearance,and the appearance changes during the movement,make the robust feature representation challenging.In the discriminative based tracking models,the observation model is designed to distinguish the target from background,and the accurate localization mainly depends on its discrim-inative ability.Existing methods witnessed the feature utility from the hand-crafted ones to deep features,however,the use of features and the computational efficiency need to be improved.As for observation model,how to integrate the advanced CF models into CNN for joint learning is meaningful.In this dissertation,we aim to develop more accurate and efficient trackers from feature enhancement and observation model.The contributions can be summarized as:(1)To tackle the high computational cost of the SVM-based trackers,which cannot satisfy the real-time requirement,we combine with dense sampling,extend the Regular-ized Least Square(RLS)based correlation filter to the large margin based dense classifi-cation model.We propose the Support Correlation Filter(SCF)model and the alternative optimization algorithm.Benefitting from the fast computing mechanism and the asym-metric squared hinge loss,the proposed method improves the discriminative ability to distinguish the target from background with real-time speed.(2)Most methods ignore the suitability to sequential image of the direct use of the CNN pre-trained on the large scale still image dataset and image classification for feature extraction,On the other hand,those methods based on video training set adopt shallow and sequential CNN architecture for representation learning and only use single output of top layer as features.The learning and use of deep features should be explored further for tracking.Based on the Fully-Convolutional Siamese Network(SiameseFC),we intro-duce the Top-Down Modulation(TDM)for feature enhancement.By back propagating the high-level semantic information to guide the feature learning of the shallow features,we can get the single layer feature containing both detailed feature and semantic infor-mation.For the data imbalance problem,we further propose the improved hinge loss.Through mining the hard examples,it improves the discriminative ability of deep features and matching performance of the similarity function.(3)Skip-connection based feature enhancement introduces large amount of parame-ters,which is challenging.To improve the representation of the top layer feature and tackle the nonuniform scale variance,we propose the adaptive multi-factor dilated convolution for feature enhancement.With shared convolutional parameters,it can extract multiple feature maps with different receptive fields by controlling the dilation rate.Furthermore,we adopt the adaptive max weighted feature fusion scheme to fuse the features with d-ifferent receptive fields,which can adaptively extract the corresponding local features according to different scale variants to enhance the feature representation and robustness to nonuniform scale variance.(4)To solve the problem of lacking model adaptation of most deep trackers,we present a bi-level optimization formulation,and propose a joint end-to-end framework of the correlation filter model guided and feature learning.To integrate the advanced Back-Aware Correlation Filter(BACF)into CNN,we unroll and truncate the alternating direction method of multipliers(ADMM)for solving the BACF model,which can be in-terpreted as an updater network.By leveraging the information in the previous frame to compute the filter for target localization in the current frame,it achieves the goal of model adaptation.To improve the robustness of the learned tracker,we require the filter obtained in current frame can also perform well on future frames.To learn the model,we use the greedy strategy to train the updater network and then jointly update the feature extraction network together With joint learning,it improves the discriminative ability of both the feature representation and observation model.
Keywords/Search Tags:Visual Object Tracking, Convolutional Neural Network, Feature Enhance-ment, Correlation Filter, Model Guided
PDF Full Text Request
Related items