Font Size: a A A

Research On Robust Object Tracking Method Based On Multi-modal Video

Posted on:2019-06-14Degree:MasterType:Thesis
Country:ChinaCandidate:C LiFull Text:PDF
GTID:2348330542497626Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Visual tracking is one of the most classic and popular research topic in computer vision,and it has been applied in many practical scenarios,such as intelligent surveillance,automatic driving,human machine interactions.It has great research and application values.Most of the existing trackers are focus on visible light videos,and many machine learning algorithms are introduced into the tracking community and achieved good performance.However,due to the existence of many challenging factors,such as low-illumination,bad weather,which hinder the further improvement of tracking performance.Hence,how to achieve robust visual tracking under extremely challenging environment is more and more important and urgent.Some recent works focus on multi-spectral visual tracking which incorporate thermal video and visible light video,and make these two modals complementary to each other for robust visual tracking even under the extremely challenging environment.Specifically,thermal sensors can still work in the night time while RGB cameras failed.Thermal sensor is easily influenced by thermal cross-over,when the temperature of object and background are similar.RGB cameras do not have this issue,it can help visual tracking in this case.Hence,we can use these two modals for more robust visual tracking even under these challenging environments.Some prior works have focus on multi-modal visual tracking task and also benchmarks for experimental comparisons.These works provide some theoretical basis for further researches and draw more and more attentions for this task.To further improve the multi-modal visual tracking task,this thesis mainly focuses on the following three points:(1)A comprehensive multi-modal visual tracking dataset is constructed.Current multi-modal dataset all exist some limitations,such as the scenario is not diverse enough,few challenging factors,which may lead to the performance of multi-modal object tracking algorithm can not be evaluated objectively and comprehensively.This thesis introduces two sets of video capture equipment,two sets of equipment combined with the use of this thesis for the later experiments to provide a more unified multi-modal video data set,such as a variety of scenarios,multiple challenges,multiple object types and so on.Only on the basis of constructing a unified and comprehensive multi-modal video data set,can we evaluate a multi-modal object tracking algorithm objectively and reasonably.(2)We propose a multi-modal object tracking algorithm based on adaptive modal selection.In order to alleviate the noises of low quality modalities and improve the efficiency in tracking method,in the process of multi-modal tracking,for each modal,the object region and its surrounding background region are divided into several sub-clusters by clustering algorithm,and then the discriminative ability between object and background,which measures the modal quality,is computed by the feature difference between their respective sub-clusters.The most reliable modality is thus selected based on the defined discriminative ability to track object by using the correlation filter algorithm.At the same time,in order to maintain the effectiveness of the object model,this thesis employs a double-threshold strategy to update all modal models.(3)A new multi-modal object tracking framework based on deep reinforcement learning is proposed.The deep reinforcement learning is introduced into the field of multi-modal object tracking,and the multi-modal tracking problem is regarded as a decision-making problem.The tracking process is completed by continuously selecting the action and moving the bounding box dynamically.Through the interaction between the agent and the environment,the tracking process can be completed,the object location and the size transformation in the tracking process can be merged into one framework,and through the choice of the action,two tasks can be completed at the same time.
Keywords/Search Tags:Visible Light Video, Thermal Video, Modal Selection, Adaptive Tracking, Real-time Processing, Deep Reinforcement Learning
PDF Full Text Request
Related items