Font Size: a A A

Research On Visual Object Tracking Based On Sparse Representation And Correlation Filter

Posted on:2021-03-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z G SongFull Text:PDF
GTID:1488306464981189Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Visual object tracking is one of the most fundamental research topics in computer vision.Its core is to estimate the trajectory and motion state(i.e.size,location,orientation,etc.)of a target in a video sequence,which can further provide important data for subsequent high-level visual tasks.Research interest in object tracking comes from the fact that it has various applications such as intelligent transportation,video surveillance and human-machine interaction.Despite remarkable progress made in recent years,developing a robust tracking algorithm is still a challenging task due to dramatic appearance changes caused by factors such as illumination variation,occlusion,fast motion and background clutter.In this thesis,we research the methods of object tracking based on the theory of sparse representation and correlation filtering,and focus on three main problems: constructing robust appearance model,measuring the reliability of tracking result and alleviating model drift.The main research contents and innovations are as follows:(1)A novel tracking model combining global and multi-scale local sparse representation is proposed.Generally,tracking models based on the sparse representation can be roughly classified into global and local.Global sparse representation model can capture the overall information of a target and is robust to some holistic target appearance variations,like illumination and pose changes.Local sparse representation model dividing the target into a series of small image patches,can capture the local information of the target and is effective to the challenge scenes,such as partial occlusion and local deformation.To effectively model target appearance,an appearance model exploiting the advantages of both global and local sparse representation is proposed for object tracking in this thesis.For the local sparse representation model,the target is usually divided into several patches with only a fixed-patch scale.In order to capture the local information of the target under the different scale to effectively model the target appearance,the target is divided into a series of local patches with the different scale and the patch-based sparse representation histograms computed under different scales are used to model the target appearance.The histogram intersection function is used to compute the similarity of histograms between the candidate and the template for each patch scale separately.Based on a descriptor of corrupted patch,the weight coefficients of model similarity in different scales are designed and their weighted sum are used as the final similarity measure in the local model.Experimental results demonstrate the effectiveness of the proposed model.(2)A tracking method based on the patch descriptor and the structural local sparse representation is proposed.In the tracker based on the structural local sparse appearance model,it does not consider the different status among these patches,and this may influence the tracking performance when the appearance of patches of the target varies inconsistently.To address this issue,the patch descriptors are designed to reflect the degree to which each patch is contaminated with noise caused by appearance changes.The object is divided into multiple non-overlapped patches,and the patch sparse coefficients are obtained by structural local sparse representation firstly.Each patch is further decomposed into several sub-patches secondly.The patch descriptors are defined as the proportion of the number of sub-patches to all sub-patches in a patch whose reconstruction error is less than the given threshold.Finally,the appearance of target is modeled by the patch descriptors and the patch sparse coefficients.Furthermore,in order to adapt to appearance changes of the target and alleviate the model drift,an outlier-aware template update scheme is introduced.(3)Kernel correlation filter tracker has received much attention in recent years due to its computational efficiency and excellent performance.However,this type of tracker still has some shortcomings to be improved,e.g.,only one type of feature used,lack of scale estimation module and judgment mechanism of tracking reliability.To address these issues,we propose a collaborative correlation filter tracking framework with online re-detection.Correlation filters with different features have different ability distinguishing the object from its surrounding background in different scenes.In this thesis,we first learn four kernelized correlation filters independently for four types of features and take the element-wise product of their response maps as the final response map.With this strategy,the response map with noises or error can be filtered out by another response map,and we can obtain a more reliable response map to improve location accuracy.Then we learn an independent scale filter to estimate the target scale.Furthermore,we employ an online detector to further enhance the performance of the proposed tracker.When the tracking result is unreliable,we activate the online detector to re-detect objects in local neighbor region for correcting the tracking result.(4)A real-time kernelized correlation filter tracking method with multiple feature integration is proposed which is based on low-dimensional features.Multi-channel feature integration can improve the performance of correlation filter tracker.However,the computational time of the correlation operation increases linearly as the feature dimension increases.In order to reduce the computational cost,a strategy of dimensionality reduction based on the principal component analysis is applied to multi-channel features.Different from the standard KCF tracker,a coarse-to-fine search strategy with two KCF trackers is adopted in translation estimation.In the stage of coarse-grained search,a transition filter with larger padding size is designed to localize the target.The large search region enables KCF to cope better with fast motion and recover from potential drifts.Based on the coarse-grained detected location,fine-grained search is carried out by another transition filter with smaller padding size which will help our tracker to better localize the position of the target after the translation filter with larger padding size is applied.In addition,to address the problem of model drift caused by model update with corrupted samples,a confidence metric based on the maximum value of correlation response and average peak-to-correlation energy is used to measure the reliability of tracking result.When the value of confidence metric is lower than a threshold,we stop updating the model to alleviate the model drift.(5)A method of designing robust tracker by combining the correlation filter with convolutional neural network is proposed,in which the Conv3-4,Conv4-4,and Conv5-4 of the pre-trained VGG19-Net is used to extract the convolutional features of the target,and then train three discriminative correlation filters for different convolutional layers independently.During tracking,we compute the convolutional responses and take the product of their response maps as the final response map for object localization.In order to adapt to the object appearance changes,we update each correlation filter model by linear interpolation.To address the issue of model drift,we design a confidence metric based on the Peak-to-Sidelobe Ratio,overlap rate of tracking boxes and trajectory smoothness degree of the tracker for adjusting the learning rate of model update.
Keywords/Search Tags:Visual object tracking, Sparse representation, Correlation filter, Multi-channel feature, Convolutional neural network, Model drift
PDF Full Text Request
Related items