Font Size: a A A

Visual Object Detection And Tracking Based On Deep Learning Technology

Posted on:2017-05-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:X Z ZhouFull Text:PDF
GTID:1318330566955709Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Online object tracking is a very traditional and challenging task in computer vision field.For a target object with abundant appearance information,a well-leant appearance plays an important role to improve the overall performance of a object tracking method.To a generic object,due to that we can not employ any useful prior knowledge,a general appearance model may not precisely depict the appearance of one specific object,and may not adapt itself to appearance changes very well under various circumstances along time.In recent years,deep learning techniques and many related deep neural networks have been shown attractive ability of hierarchical feature learning,and have obtain grate achievements in computer vision tasks,like image classification,object recognition and scene understanding.Inspired by the powerful ability of feature learning,my work of this dissertation mainly focus on applying deep learning models to handle object tracking task.With the aid of several deep neural networks,we attempt to deal with the appearance modeling problem for a generic object,and learn the appearance variations in different environments as much as possible.Besides,inspired by the idea of hierarchical processing of deep learning,we propose a layered data association method to address the trajectory estimation problem of little objects without significant appearance information.Since the training problem has been solved,deep neural networks have been widely applied to many machine learning tasks ranging from speech recognition to computer vision.Deep neural networks have been confirmed that they have outstanding feature learning ability.Compared to some traditional handcraft feature descriptors,hierarchical features learned by unsupervised pre-trained deep models are completely data-driven and learnable.Apart of that,these features are almost independent to specific tasks,and can be shared and reused.Given that fact we propose a deep neural network based appearance model which unsupervised learned from a stacked denoising autoencoder(SDAE)over a large amount of auxiliary image data.Each layer of the SDAE represents a kind of feature abstraction and the question of which layer's feature is most suitable for appearance modeling in different environments is still open.Then we propose an AdaBoost based online feature selection framework to fusing features of different layers.Our method provides a feasible way to exploit the hierarchical deep features and gains a promising performance on challenging video sequences.Contrast to fully connected feed-forward neural networks,convolutional neural networks(CNNs)have more advantages in nature.The great success of CNNs in computer vision tasks turns out to have better feature ability then feed-forward networks on image processing.Then,we propose a multi-layer CNNs based appearance model to facilitate online object tracking.When straightforward applying CNNs to online system with small training set,they are prone to overfitting and sensitive to unreliable training samples.To deal with this problem,we propose a particle posterior re-sampling method based on Metropolis-Hastings algorithm.The proposed method can not only reshape a more robust particle posterior,but also provide a novel sample acquisition to get a set of more reliable training samples.Our tracking method has significantly improved the tracking performance over an open benchmark.Unlike DNNs and CNNs,recurrent neural network(RNN)shows their superiority on learning contextual dependencies,and is a powerful sequence modeling tool.When the gradient vanishing problem has been solved by long short-term memory(LSTM)cells,LSTM based recurrent neural network has shown great potential in those tasks that contextual information is essential,like handwriting recognition,voice conversion and scene understanding.In our work,we address the object tracking task as a sequence labeling problem via a bidirectional LSTM network(BLSTM-RNN).We represent the target object with a sequence of semantic sub-patches which imply the underlying spatial contextual relationship.The BLSTM-RNN model can produce much informative labeling results,based on which we build a robust object position estimation method and a heuristic model updating strategy.Our method provides a novel way to handle object tracking task with recurrent neural networks,and shows competitive performance against several lasted tracking methods.On the tracking of little objects without significant appearance,deep models which are known for their outstanding ability of feature learning can not be straightforward applied to build appearance model.However,we are really inspired by the idea of hierarchical processing from deep learning methods.In our work,we also propose a layered data association method for robust estimating little objects' trajectory in clutter.In the local processing layer,we propose a shift token passing to generate some trajectorylets which well confirm to local motion model constraints.And in the global layer,a weighted direct acyclic graph based trajectorylets splicing algorithm is proposed to search for an optimal splicing path serving as the final trajectory.Our method has been used for tennis ball tracking,and gains the best performance against several state-of-the-art methods over two real tennis game videos.The proposed method makes a better tradeoff between accuracy and recall rate.
Keywords/Search Tags:object tracking, appearance model, deep learning, deep neural networks, feature learning
PDF Full Text Request
Related items