Font Size: a A A

Research On Methods For Video Object Tracking Based On Correlation Filters And Siamese Networks

Posted on:2020-06-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y L KuaiFull Text:PDF
GTID:1488306548491804Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Video object tracking has always been a hot and difficult research topic in computer vision with a wide range of applications,such as video surveillance,unmanned driving,precise guidance and battlefield reconnaissance.In this paper,we only focus on singlecamera,single-target,short-term and model-free tracking.The task is to estimate the state(location,size and et.al.)of an arbitrary target in an image sequence,given only its initial location(a rectangular bounding box)in the first frame.The initial location is generated by a detector or manually annotated.For a long time,researchers have accumulated abundant achievements in the theory and applications of visual object tracking.However,achieving robust real-time object tracking remains challenging in complex environments.Since 2012,Discriminative Correlation Filters(DCF)and Siamese network based trackers have gradually become two mainstreams in video object tracking and occupied the top rankings on multiple public benchmarks.DCF based trackers achieves intensive sampling of the search area based on periodic assumption.By utilizing properties of the circular structure,DCF based trackers could efficiently solve filter coefficients and locate targets.However,standard DCF suffer from the boundary effect caused by the periodic assumption.The detection scores are only accurate near the center of the search area,which leads to a very restricted target search area at the detection step.Siamese network based trackers are trained offline to learn the similarity metric between samples with massive annotated data in an end-to-end manner and then utilized for online tracking.However,the distraction problem caused by semantic backgrounds and the simple modeling strategy(fixed or linear interpolated)of target templates often lead to performance degradation.To handle the above problems,this paper innovatively proposes corresponding solutions to achieve real-time robust object tracking in visible video.Furthermore,this paper extends the theory of DCF and Siamese network to RGBD and RGBT dual-modal video object tracking and proposes robust algorithms for each field.The main work and innovations of this paper are listed below:1.An adaptively windowed correlation filter for visual tracking is proposed.To overcome shortcomings of the cosine window and regularization window used in existing DCF based tracker in mitigating the boundary effect,the algorithm adopts a Bayesian classifier to calculate the object likelihood map based on the color histogram distributions of the target and background,which is then merged with the cosine window to generate the adaptive window.The adaptive window suppresses the background information while highlighting the foreground object,thus adjusting the filter to focus more on the target area.By updating the color histogram distributions per frame,the adaptive window can always capture target appearance variations.Experimental results show that adaptively windowed DCF trackers greatly improves the tracking accuracy of their baselines while achieving similar tracking efficiency.2.A multi-layer feature joint learning based on Siamese network is proposed for visual tracking.To mitigate the distractions from semantic backgrounds in Siamese network based trackers,the proposed method designs a new network(Hyper-Siamese)to aggregate the hierarchical feature maps of Siam FC and constitute the hyper-feature representations of the target,based on the fact that different convolutional layers of deep networks characterize the target from different perspectives.Hyper-Siamese network is trained end-to-end offline on the ILSVRC2015 dataset and later utilized for online tracking.By visualizing the outputs of different layers and comparing the tracking results under various concatenation mode of layers,we prove that different convolutional layers are all useful for object tracking.Experimental results on public benchmarks demonstrate that our proposed algorithm performs favorably against many state-of-the-art trackers while maintaining real-time tracking speed.Meanwhile,our proposed method also tracks targets well in terms of similar background distractions.3.To simultaneously handle the semantic distractions and model updating problems in Siamese network based trackers,a target objectness and template model based on Siamese network is proposed for visual tracking.The target objectness model computes the target likelihood map based on color distributions,which is later masked on the previous response map,and subsequently adjusts the final response map to focus on the target.This practice enlarges the discrimination between the tracked target and surrounding backgrounds,thus alleviating the distraction problem.The target template model proposes a Gaussian mixed model to encode target appearance variations.The proposed Gaussian model enhances diversity and simultaneously reduces redundancy between target samples.Experimental results on multiple benchmarks prove that our proposed algorithm are both effective and efficient.4.A correlation filter tracking method based on spatial feature reweighting and occlusion detection is proposed for RGBD tracking.To make full use of the complementary characteristics of the depth information to the visible cues in RGBD data,we comprehensively utilizes the prior,depth and color information to compute a fine foregroundbackground segmentation map based on the first work.This map is masked on the extracted features for adaptive reweighting and adjusts the DCF based trackers to focus on the area with higher target possibility.Meanwhile,we propose an occlusion detection and handling mechanism based on this segmentation map,which helps to detect an occlusion early and avoids contaminations of the target model.Our proposed framework is generic and could transform an arbitrary RGB tracker to an RGBD tracker.Experimental results demonstrate that DCF trackers integrated with our framework outperforms their baselines and our proposed tracker achieves the best performance on two existing benchmarks.5.A twofold Siamese network is proposed for RGBT tracking.To integrating the complementary information from RGB and thermal sources,we design a twofold Siamese network,which is composed of an RGB branch and a thermal branch.Each branch respectively implements object tracking in different videos.The parameters in the RGB branch are transferred from Siam FC.The thermal branch is initialized with parameter weights of the trained network in Siam FC and fine-tuned with constructed thermal image pairs to better capture the target characteristics in the thermal data.The algorithm further proposes two evaluation criteria to measure the confidence degree of response maps,and adaptively fuses response maps from different videos to locate targets.This is the first time that end-to-end training based on deep learning is introduced to the RGBT tracking field.Experimental results show that the proposed algorithm ranks second and first respectively in MPR and MSR and achieves state-of-the-art tracking performance.
Keywords/Search Tags:Video Object Tracking, Discriminative Correlation Filters(DCF), Siamese Network, Color Histogram Distribution, Semantic Distractions, Model Update, Dual-modal Video
PDF Full Text Request
Related items