Font Size: a A A

Research On Correlation Filters Based On Spatial-temporal Context Modeling For Visual Tracking

Posted on:2021-04-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:F LiFull Text:PDF
GTID:1488306569983939Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Given only the annotation of target on the first frame,visual tracking aims to estimate both its location and size in subsequent frames.Since the target is lack of sufficient prior information,and often encountered with appearance variations caused by motion blur,deformation and occlusion,it remains challenging to design an accurate and fast tracking approach.The core part of visual tracking is to build robust and efficient target appearance models.Due to the limited prior knowledge of target,it is vital to build the appearance models via the spatial-temporal context information in tracking scenes.In general,the spatial-temporal context consists of two parts:the temporal and spatial context information.The temporal information contains historical tracking results,the learned classfier models,and sample learning weights.And the spatial context is defined as the backgound regions in a determined region outside the target,which not only provides samples for model learning,but also improves the robustness of visual tracking.Recently,correlation filter(CF)-based methods have received considerable research due to their high efficiency,and achieved the state-of-the-art performance.Despite these advances,the existing CF methods still have several drawbacks on designing robust appearance models via spatial-temporal context information:On the one hand,while the CF methods can leverage the spatial context information on utilizing image pyramid strategies for scale estimation,they generally fail to handle target aspect ratio variations during tracking,therefore restricting the tracking accuracy.Meanwhile,for the sake of allievating the boundary discontinuity issues caused by circular assumption of training samples,the CF methods employ cosine window for feature preprocessing,but neglect its risks of suppressing the spatial contexts and causing sample contamination.On the other hand,to address the target appearance variations during tracking,the CF methods update tracking models with the temporal information,but cannot simultaneously implementing efficient,low-memory and adaptive model updating strategies.In addition,it remains less investigated on how to adaptively employ the spatial-temporal context for model learning under the CF framework.To address the above issues,this dissertation is based on the correlation filter theory,and investigates the effects of spatial-temporal context information for designing accurate and efficient taregt appearance models.The contributions of the dissertation are summarized as:(1)To address the aspect ratio variations of tracking objects,a novel tracking method is proposed by integrating both boundary and center correlation filters.Beside tracking the target position with a center CF,by leveraging the spatial context information near the target,a family of boundary CFs are introduced to localize the four target boundaries,thus can adapt to the target aspect ratio changes flexibly.Furthermore,a near-orthogonal regularizer is suggested to integrate all CFs into a unified framework for joint training,thereby leading to more powerful models.An optimization algorithm is also developed to solve the proposed model,in which each sub-problem has closed-form solution,and thus guarantees the fast convergence.Experimental results show that the proposed model can handle both the scale and aspect ratio variations of targets during tracking,and performs favorably against the state-of-the-art tracking methods.(2)The existing CF methods all employ cosine window for sample preprocessing,but neglect its risk of suppressing the spatial contexts and causing sample contamination.To address this,this dissertation studies the feasibility and strategies of removing cosine window from CF methods.The feasibility of removing cosine window from CF methods with spatial regularization are first validated with numerous experiments.Then,two mask windows are further suggested to reweight the estimation errors of samples for selecting appropriate ones within the spatial context regions,and thus reduce the impacts of boundary discontinuous samples on model learning.By incorporating the proposed methods into several representative CF method with spatial regularization,the results show that they can not only avoid suppressing the spatial context and sample contamination,but also are superior to their counterparts with cosine window on multiple datasets.(3)The existing CF methods cannot simultaneously implement efficient,low-memory and adaptive model updating strategies.To address this,this dissertation starts from a typical CF method,and presents a novel spatial-temporal regularized correlation filter(dubbed STRCF).Benefitted from the proposed temporal regularization term,STRCF can adaptively incorporate the temporal information into the current model learning,and leads to more robust CF models.To solve the STRCF model,an ADMM algorithm is further proposed and can reach the real-time tracking speed on a single CPU.In addition,STRCF only needs to preserve the CF model learned in the previous frame,and thus has lower memory overhead.Experiments on multiple datasets show that,the proposed model can implement highly efficent,low-memory and adaptive model updating strategies,and performs on par with the state-of-the-art trackers.(4)The existing CF methods cannot adaptively leverage the spatial-temporal context for model learning and tracking.To address this,this dissertation presents an adaptive multiple spatial-temporal contexts correlation filter framework(dubbed AMCCF).In particular,a sigmoid spatial weight map is first suggested to control the impacts of spatial context regions for more effective model learning.Based on this,multiple context regions with different sizes are further modeled by incorporating the spatial weight maps with different parameters into the CF models.To adaptively utilize the spatial-temporal context information during tracking,a temporal regularization term is proposed by utilizing the historical sample learning weights,and further incorporated into an optimization model for j ointly estimating the target positions and assigning dynamic fusion weights for the response maps from different CF models.Experiments show that the AMCCF method can adaptively leverage the spatial-temporal context for model learning,and performs favorably against the state-of-the-art trackers.
Keywords/Search Tags:Visual Tracking, Correlation Filter, Target Appearance Model, Spatial-Temporal Context, Spatial Regularization
PDF Full Text Request
Related items