Font Size: a A A

Learning Dynamic Collaborative Graph For RGB-T Object Tracking

Posted on:2019-04-23Degree:MasterType:Thesis
Country:ChinaCandidate:N ZhaoFull Text:PDF
GTID:2348330542497647Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Visual tracking aims to estimate the state information of a target in a given video sequence.It is an important and fundamental research topic in the community of computer vision and also the key and technology core of intelligent video surveillance.In recent years,visual tracking has made many breakthroughs,and many tracking algorithms based on different theoretical frameworks have been proposed.These algorithms have significantly improved the performance of target tracking in both time and accuracy.At the same time,a number of standard tracking datasets containing complex challenges are public to facilitate the performance evaluation of different object tracking algorithms.These efforts laid the foundations of visual tracking in both theory and application.Although achieves good tracking performance,these tracking algorithms that uses single modality cannot work well in many complex environments and some extreme conditions,such as low illumination,partial occlusion,severe smog and other harsh environments.To overcome the above problems,we compensate the defects of the visible light modal information with an additional modality,i.e.,the thermal infrared source,which pro'vides complementary benefits of the visible light modality(RGB)and the thermal infrared modality(T)and thus achieves improved object tracking performance under complex conditions.The major works and contributions of this dissertation are as follows:First,in the aspect of RGB tracking,in order to solve the problem of target drift in the tracking-by-detection framework,we propose a robust tracking algorithm based on the absorbing Markov chain.The proposed method is mainly based on the image patch weighted information fusion in the framework of structured SVM tracking.The traditional weighted information fusion method refers to dividing the target rectangular box into a plurality of uniformly non-overlapping small image patches.Then,assigning a weight to each image patch in a semi-supervised manner to represent the importance of the image patch in the expression of the object,that is,the greater the weight,the greater the probability that the image patch represents the target,otherwise,the more likely to be in the background.In particular,this thesis uses the absorbing Markov chain can jointly consider the appearance divergence and spatial distribution of salient objects and the background,with the absorption of Markov properties of the graph structure,the initial weight of the image block spread.At the same time,considering that the initial seed points may contain noise,a seed point optimization algorithm is proposed to screen the initial seed points to avoid the influence of noise-containing seed points on the correct result.Finally,the learning weights are integrated into the SVM-based tracking algorithm to improve the stability of the algorithm.Experiments on a publicly available dataset show that the proposed algorithm has good performance.Second,in the aspect of RGB-T object tracking,the existing datasets,such as OSU-CT and LITIV,have the disadvantages of single scenario,fewer challenges,and fewer video frames,which are unfair to evaluate for multi-modal target tracking.In order to be able to establish a comprehensive standard RGB-T tracking dataset to evaluate various multi-modal object tracking algorithms,we contribute a video dataset for RGB-T tracking purpose.Comparing with existing ones,the new dataset has the following advantages:1)The size of the dataset is sufficiently large for large-scale performance evaluation(total frame number:210K,maximum frames per video pair:8K).2)The alignment between RGB-T video pairs is highly accurate,which does not need pre-and post-processing.3)The occlusion levels are annotated for analyzing the occlusion-sensitive performance of different methods.The dataset also includes challenges such as complex changes in low-light,background clutter and motion blur.Third,in the aspect of RGB-T object tracking,we propose a novel graph model,called weighted sparse representation regularized graph,to learn a robust object representation using RGB and thermal data for visual tracking.In particular,the tracked object is represented with a graph by image patches as nodes.The edges in the graph show the affinity between two connected image patches.This graph is dynamically learned from two aspects.First,the graph affinity(i.e.,graph structure and edge weights)that indicates the appearance compatibility of two neighboring nodes is optimized based on the weighted sparse representation,in which the modality weight is introduced to fuse different modal information adaptively.Second,each node weight that indicates how likely it belongs to the foreground is propagated from others along with graph affinity.The optimized patch weights are then imposed on the extracted RGB and thermal features,and the target object is finally located by adopting the structured SVM algorithm.Extensive experiments on both public and newly created datasets demonstrate the effectiveness of the proposed tracker and the performance superiority over several state-of-the art methods.
Keywords/Search Tags:Visual Tracking, Seed Optimization, Sparse Representation, Dynamic Graph Learning, Joint Optimization, Dataset
PDF Full Text Request
Related items