Font Size: a A A

Research On Visual Tracking Algorithms With Low-Level And High-Level Feature Representations

Posted on:2020-09-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:L J WangFull Text:PDF
GTID:1368330572961902Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Given the target location in the first frame,online visual tracking aims to locate the target in the following each frame.It is one of the fundamental techniques in the computer vision area and find wide applications in video surveillance,traffic monitoring,unmanned vehicle,human-computer interaction,to name a few.It has also become one hot research topic.With the development of feature engineering,machine learning algorithms,and the computation power,recent years have witnessed significant progress in the research field of visual tracking in terms of both efficiency and accuracy.Nevertheless,there still exist many challenging issues that are left unsolved.For instance,it is very challenging for one tracking algorithm to effectively handle target appearance variation caused by various factors.Effective offline training and online update of appearance models is still an open question.The exploration of temporal coherence of video is also under investigated.To alleviate the above issues,this dissertation focus on the research of modeling target appearance with different levels of feature representations.The major contribution of this dissertation is summarized as below:First,a structure constrained region grouping algorithm is proposed to better leverage low-level features.During the training stage,the structure relationships between neighboring super-pixels are modeled by grouping prototype matrices.The grouping stage is performed in two steps.In the first step,neighboring superpixels are grouped from four directions under the con-strain of structure information encoded in the grouping prototype matrices.In the second steps,foreground is separated from background by grouping foreground superpixels identified in the first stage into one foreground region.A new tracking method is further designed based on the proposed grouping algorithm,which casts object tracking as a binary classification problem of superpixels.The Tracking target is then detected according to the foreground segmentation results.Experimental evaluations suggest that the proposed region grouping algorithm can ef-fectively leverage the structure information and significantly benefit foreground segmentation,leading to more accurate object tracking.Second,a new supeipixel tracking algorithm based on spatial and temporal smoothness is proposed to enforce the spatial and temporal consistency of low-level features.Spatial s-moothness is designed to enforce the geometric constraints between local target parts,and bet-ter explore the manifold structure conveyed by unlabeled superpixels.Temporal smoothness is implemented by optical flow which projects corresponding superpixels in the last frame to the current frame and captures the target appearance variation in the short term.Appearance fitness constraint is modeled by an online Random Forest classifier,which encodes the target appearance in a long term view and provides the appearance prior information.The above three constraints are embedded within a graphical model and unified under an optimization frame-work.Accurate tracking can then be achieved through simultaneously optimizing the above constraints.Experimental results justify that the proposed constraints can considerably improve tracking accuracy and deliver reasonable foreground segmentation results even without ground truth segmentation mask.Third,to explore high-level feature based appearance representation,a deep feature anal-ysis is conducted for the purpose of online visual tracking.It is observed that convolutional layers in different levels characterize the target from different perspectives,and that only a sub-set of neurons are relevant for a tracking target.Based on these observations,a new tracking algorithm with fully convolutional networks is proposed,where both top and lower layers are jointly used for target localization guided by a background distracter detection mechanism.The top layer encodes more semantic features and serves as a category detector,while the lower layer carries more discriminative information and can better separate the target from distracters with similar appearance.A feature map selection method is also developed to remove noisy and irrelevant feature maps,which can reduce computation redundancy and improve tracking accuracy.Extensive evaluation on the widely used tracking benchmark shows that the proposed tacker outperforms the state-of-the-arts significantly.Finally,to improve the generalization of high-level features for online applications,a se-quential training method for convolutional neural networks(CNNs)is proposed.A CNN is re-garded as an ensemble with each channel of the output feature map as an individual base learner.Each base learner is trained using different loss criterions to reduce correlation and avoid over-training.To achieve the best ensemble online,all the base learners are sequentially sampled into the ensemble via important sampling.To further improve the robustness of each base learner,we propose to train the convolutional layers with random binary masks,which serves as a reg-ularization to enforce each base learner to focus on different input features.A scale prediction network is also design to estimate target scales using deep features.Extensive experiments are conducted on two challenging benchmark data sets and demonstrate that our tracking algorithm can outperform state-of-the-art methods with a considerable margin.
Keywords/Search Tags:Visual tracking, deep convolutional neural networks, structure constraints, spatial-temporal smoothness, feature selection
PDF Full Text Request
Related items