Font Size: a A A

Research On Robust Visual Tracking Algorithms Considering Multiple Information

Posted on:2021-11-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:G J LiFull Text:PDF
GTID:1488306458476914Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Visual tracking aims to solve the consistent tracking of the interested target in a video sequence,it plays a vital role in many artificial intelligence applications,e.g.,intelligent surveillance,intelligent driving,intelligent interaction and so on.Although there is much breakthrough in terms of tracking precision and speed in recent years,designing an efficient and robust general tracking algorithm is still quite challenging.On the one hand,rare prior information makes it difficult to perform offline training of the appearance model.On the other hand,significant appearance changes caused by occlusion,deformation and background clutter further complicate the problem of visual tracking.To solve these difficulties,this dissertation aims at building robust appearance models and focuses on how to make use of and effectively fuse multiple visual cues and appearance information in the tracking process,then carries out the research on robust visual tracking algorithms considering multiple information,including spatial-temporal context information,multi-view information and long-term and short-term memory information.The main contributions of this dissertation are summarized as follows:(1)To counter the interference of background noise in the video scene,the dissertation proposes a structural context-aware visual tracking algorithm.This algorithm both constructs both local object dictionary and local context dictionary to jointly represent target image patches.Then an Impact Allocation Strategy(IAS)is proposed to assign adaptive positive impact factors to different local patches based on the discrimination ability of them.The spatial context is used to consider the appearance difference among different local patches.Furthermore,the temporal context is exploited to introduce some historical information for more accurate locating.To ensure that all effective appearance changes can be updated into the dictionary without missing,a structural dictionary update scheme is presented for achieving reliable model updating.By combining spatial and temporal context information,the proposed algorithm obtains a DP score of 75.3% and an AUC score of54.6% on the OTB-50 dataset,which achieves satisfactory tracking performance.(2)To cope with the significant appearance changes in the tracking process,the dissertation proposes a multi-view visual tracking algorithm based on discriminative correlation filters.This algorithm considers multiple complementary views to increase the diversity of the target appearance.In the stage of the translation estimation,an adaptive multi-view collaboration strategy is proposed to highlight different contributions of different views by jointly considering the stability and discrimination.Then the final result is determined by a linear combination of the response maps of all views.In the stage of the model updating,an effective memory-improved model update rule is introduced to avoid falling into a contaminated target model.Furthermore,varying and independent learning rates are designed for different views to alleviate the model drift problem.In the stage of the scale estimation,a distractor-aware scale update scheme is developed to avoid noisy scale estimation in case of temporal tracking failure.The proposed algorithm obtains a DP score of 82.9% and an AUC score of 62.9% on the OTB-50 dataset,experimental results demonstrate that the proposed tracking algorithm achieves the excellent tracking performance.(3)To alleviate the problem of model drift,the dissertation proposes a reliable visual object tracking algorithm based on a dual-memory selection(DMS)model.This algorithm maintains both short-term memory and long-term memory of the target appearance to balance the model adaptivity and robustness.The proposed DMS model consists of four components: a short-term memory tracker,a long-term memory tracker,the memory evaluation criterion(MEC)and a memory selector.The short-term tracker emphasizes the recent target appearance and performs well in adapting to rapid appearance changes.The long-term tracker maintains the memory of the historical target appearance and is robust for handling heavy occlusions.The memory selector completes the effective fusion of long and short memory information,it adaptively selects reliable memory pattern depending on the need for handling the current challenge.Moreover,by introducing the temporal context into the reliability evaluation,a stable output is obtained with temporal continuity.The proposed algorithm achieves a DP score of 86.5% and an AUC score of 66.4% on the OTB-50 dataset.Furthermore,extensive experimental results demonstrate that the proposed tracking algorithm performs favorably compared to other correlation tracking algorithms in terms of deformation and occlusion attribute.(4)To cope with the long-term tracking,the dissertation proposes a spatial-temporal reliability evaluation based long-term object tracking algorithm.This algorithm consists of three components to provide a robust long-term tracking framework.They are a correlation filter tracker,a coarse-to-fine re-detector and an output integrator.The proposed framework integrates spatial context and temporal context information to obtain more accurate re-detection result and reliability evaluation.In this framework,the re-detector exploits coarse-to-fine detection strategy to refine the unreliable tracking result.The output integrator evaluates the reliability of the correlation filter tracker and coarse-to-fine redetector and fuses their results to output a more accurate target position.The proposed algorithm achieves a DP score of 87.3% and an AUC score of 67.1% on the OTB-50 dataset,experimental results demonstrate the effectiveness and superiority of the proposed tracking algorithm.
Keywords/Search Tags:Visual tracking, Long-term tracking, Sparse represetation, Correlation filter, Spatial-temmoral context, Multi-view learning, Long-short term memory, Model drift
PDF Full Text Request
Related items