Font Size: a A A

Research On Visual Object Tracking Based On Spatial And Temporal Context

Posted on:2020-07-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z J ChenFull Text:PDF
GTID:1368330599961824Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Visual object tracking is one of the most important research areas in computer vision.Given the location and size of the target in the first frame,a visual tracker is supposed to predict the accurate location and size of the target in each upcoming frame.Advanced track-ing techniques can benefit various applications such as robotics,visual enhancements,and surveillance.Although substantial improvements have been made in visual object tracking,it is still difficult for existing tracking algorithms to perform robustly in complex tracking scenarios.There exist three main issues for visual object trackers which act in complex tracking scenarios.First,severe occlusion may deteriorate the tracking performance.This is because existing algorithms rely on a continuous online update process.Long-term and severe oc-clusion of the target object tends to make the appearance model learn and express features from the occlusion,which worse the tracking performance.Second,the deformation of the target object also affects the accuracy of object trackers.This is because the appearance model cannot correctly capture appearance of the target with rigid or non-rigid deformation.The tracker may easily drift to objects with similar appearance and then completely lose the target.Third,drifting deteriorates the tracking performance.This is because the tracker is susceptible to background noise,and the tracking result is offset by the location of the real target.This offset will accumulate over time,and the output of a tracker will drift to an unrelated background area,which cause failure in the tracking task.To solve the problems mentioned above,this thesis aims at incorporating the spatial and temporal context to build robust trackers which can handle the challenging issues in complex tracking scenarios.In details,the spatial context is the visual information that surrounds the tracking target,and the temporal context is what has been observed in the recent past at the nearby areas.The spatial context can be obtained from analysing the relationship between the target and its surroundings.It provides complementary information for the tracker to predict the state of the target and can be helpful when dealing with the scale changes and shape variances of the target.The temporal context is the information obtained from the visual context and object states of the previous frames.It contains the moving pattern of the target which can complement the visual information extracted within the current frame.The temporal context is effective for overcoming the issues like heavy occlusion and cluttered background in which the target may deliver too limited visual information to be tracked accurately.Overall,the use of the spatial and temporal context can be beneficial for visual trackers to build a robust appearance model for the target and thus improve the tracking performance when encountering the situations like occlusions or drifting.In this thesis,we study how to make better use of these spatial and temporal contexts for designing better tracking algorithms that handle challenges like occlusion,shape variance,and drifting in the complex tracking scenarios.?1?To tackle the occlusion problem,we introduce a new template called Mask tem-plate to build a robust appearance model for the target and then design a novel tracking algorithm based on sparse representation.The Mask template preserves the temporal con-text with frame differences thus can be used in conjunction with target templates to build an improved appearance model which is robust against corruptions like occlusions.Our re-search has three key contributions.a)The Mask template not only improves the appearance model of a target by introducing temporal context,but also reduces the complexity of the?1minimization problem when computing the coefficients of these templates.b)We develop a target state estimation module to obtain a better motion estimation.c)We demonstrate that the Accelerated Proximal Gradient?APG?algorithm is applicable for solving our tracking model.In practice,our algorithm not only improves tracking accuracy but also reduces the computational burden.In the experiment part,we compare the proposed method with 21excellent trackers.The experiment results show that the proposed tracker is robust when suffers from corruptions like occlusion and illumination variance.?2?To tackle the shape variance problem,we propose a tracker with a bi-channel s-tucture.With deep learning techniques,the tracker analyses the appearance change among frames and provides pixel-level outputs for general tracking tasks.The developed tracker cooperates two Fully Convolutional Network?FCN?branches to achieve robust tracking.More specifically,one of the FCN is introduced to analyse the low-level motion informa-tion,and another is used to extract the high-level semantic change of the target in adjacent frames.The low-level FCN branch analyses the optical flow patterns to obtain the local temporal information,i.e.the motion of each component of the target.The high-level FC-N branch compares two successive frames and outputs the semantic change for the target within the two frames in a pixel level,e.g.one pixel which belongs to the background area in the previous frame changes to the foreground target area in the current frame.Since these two branches share the same spatial location,their outputs can be fused to better predict the state of a target.In addition,different from most existing deep learning-based trackers,our tracker can reduce the need for fine-tuning the neural networks based on the appearance of the target in the first frame,improving the processing efficiency.We compare the proposed tracker with other trackers on a densely-annotated dataset DAVIS.Experiment results show that our model effectively improves tracking when the target rotates or varies its shape.?3?We improve the tracking accuracy by enhancing the quality of the proposals adopted in the tracking-by-detection framework with a focus mechanism,which enhances the tracker's ability in handling drifting.In the tracking-by-detection framework,proposals are the selected windows indicating whether the target may present inside.They can reduce the searching space for a tracking algorithm when locating the target.Inspired by the biological visual attention mechanism in which the focusing point of eyes is gradually moved to the interested area by analysing the spatial context,we also introduce spatial context to progressively improve the quality of proposals when performing tracking.Accordingly,we propose a focus-transfer model that is complementary to the existing tracking-by-detection framework.In particular,it gradually adjusts the location and size of proposals to help them better approximate the ground-truth state of the target.The use of this model is flexible.It can be built upon traditional hand engineered features to augment the traditional tracking-by-detection algorithms that do not use deep learning techniques.Besides,it can be embedded within neural networks and improve the CNN-based trackers.Experiment results show that our model effectively reduces the number of low-quality proposals and increases the tracking accuracy in complex tracking scenarios,thus reducing the risk of drifting and losing the target.
Keywords/Search Tags:Visual Tracking, Deep Learning, Sparse Representation, Convolutional Neural Network, Object Proposal, Object Detection
PDF Full Text Request
Related items