Font Size: a A A

On Model Update Of Visual Object Tracking Based On Deep Learning

Posted on:2022-02-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:B LiFull Text:PDF
GTID:1488306575951609Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
A video is a set which contains visual information of the same identity in different time and space.Object tracking is about localizing objects and establishing the correspondence between objects in different time and space.Therefore,object tracking plays a key role in video processing.As the tracking process goes on,the tracker repeatedly refines the target model using information of the target gathered along tracking.This process is called model update.On the positive side,model update enables the target model to adapt to target variations and thus capable of handling various complicated situations.On the other side,model update may be negatively impacted by the noisy and erroneous information gathered which eventually causes degenerated model and tracking failure.Therefore,it is a challenging problem to design model update algorithms that can enhance the tracker in complicated situations while avoiding model degeneration.Moreover,model update is a general process in sequential problems and therefore,besides object tracking,it is also valuable to problems like action recognition and online personal profiling.Recently,the breakthrough in deep learning has pushed foward various research fields.In essence,deep learning is about replacing the manual design process with data fitting.For a long time,the model update process in object tracking is manually designed,which relies on the experience of the designer and is laborsome with lots of trial and error.Moreover,the algorithm is tuned in small amount of data and thus not necessarily the optimal solution in large scale dataset.Most importantly,by training on large amounts of offline data,deep learning based models can learn to converge fast and differentiate useful information from noise,therefore suited for model update.In this context,this manuscript focuses on the model update problem in object tracking based on deep learning.Concretely,the research contents in this dissertation are as follows:(1)Modeling the model update process with the meta-learning framework which enables data-driven model update.Model update in object tracking is typically taken as an online learning problem,that is,training samples are gathered during tracking and learns the target model using learning algorithms such as stochastic gradient descent(SGD).SGD is manually designed and requires carefully tuning hyperparameters like the learning rate,number of iterations and etc.to meet the requirement of model generalization.Moreover,it needs hundreds of iterations of forward and backward computation and is hard to run in realtime.This dissertation uses meta-learning to model the learning algorithm and learns the learning algorithm using large scale of offline video data.Given the sequential nature of the model update process,this dissertation uses recurrent neural network as the model updater.The trained model updater does not need to manually tune any hyperparameters and requires only one forward computation to update the target model and thus running faster than realtime,achiving 82 FPS.This dissertation empically verifies on several benchmarks that the proposed method outperforms strong baselines such as EMA by 4% and SGD by 1% in terms of the Area Under Curve(AUC)metric.(2)Splitting the target model into fast model and slow model and discriminating visually similar distractors based on meta-learner of the fast model.One of the challenges in object tracking is discriminating distractors near the target which is closely related to the capability of model update.When there is no distractor,the target model should expand the decision boundary such that it contains dramastic variations of the target in terms of view point,illumination,pose and etc.However,when there are distractors,the boundary should shrink to discriminate the target and distractors.To tackle this challenge,this work splits the target model into two components,a slow model which ensures the decision boundary encapsulates target variations,and a fast model which is quickly updated to discriminate target and distractors.Since the two components are mutually indepdent,the fast model can be dramatically updated without the degeneration of the whole target model.For quickly modeling the target and distractors,this work trains a meta-learner which makes use of optimimzed ridge regression solver to learn classifiers of the target and distractors given few training samples.On distractor-heavy scenarios,in terms of AUC,this work outperforms baselines by 4%.On general scenarios,this work outperforms baselines by 0.3%.(3)Capturing the transient variations of target and discriminating visually similar distractors based on global structure consistency in short period.The key observations in this section is that: in a short period,the global structure(including object correspondence and spatial layout)of the whole scene is consistent.This dissertation designs a neural network which can implictly models the GSC which can effectively handle the transient variations of the target and visually similar distractors.Since the model is quickly updated to capture the transient variations of the target,this work adopts the long-short term design to maintain the stability of the target model.The long-term module models the target model in a long-time span while the short-term module models the short-term variations.These two components are fused in the responce map to ensure fine granularity and counteract some uncertainty.Moreover,by sharing the feature extraction module in these two components,the tracking efficiency is not compromised.This manuscript emprically verifies the proposed method in several large-scale tracking benchmark which outperforms the baselines by 2% in terms of AUC and demonstrates SOTA performance.(4)Designing a model update method based on lookahead information selection and verifying the method in object tracking and action recognition task.In reality,not all frames in a video are helpful to the object tracking/action recognition task.Using nearly identical frames in adjacent time steps or background frames to update the model will overwrite useful information and harms the recognition performance.On the other hand,some action categories are visually similar,and the discriminative information is present in a short time period.The model should be quickly updated to absord this discriminative information.This work aims to design a model update method based on information selection which removes useless and harmful frames and keeps the critical frames.This manuscript designs a lookahead planning mechanism which lookahead frames and outputs a skipping plan.Specifically,this work takes the distinctiveness and importance into consideration and fuses these two cues to obtain the skipping plan.The proposed method can skip over 80% frames while achiving even better recognition performance.In summary,this dissertation proposes a meta-learning based framework for model update modeling.For the flexibility and stability dilemma,this dissertation proposes a two-component based and an information selection based model update mechanism.Model update based on deep learning is still in its early stage and this dissertation provides some insightful explorations in this direction.
Keywords/Search Tags:object tracking, target model, model update, deep learning, meta learning, recurrent neural network
PDF Full Text Request
Related items