Font Size: a A A

Research On Object Tracking Based On Deep Learning

Posted on:2019-12-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:X K LuFull Text:PDF
GTID:1368330590970360Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
With the spread of surveillance device,mobile devices in daily life,large amount of video data need to be processed.The demand of video analysis and video understanding is increasing quickly.Video tracking is the basic task in video analyses which provide support for auto driving,person re-identification,crowd counting and event detection.This dissertation mainly focuses on object tracking which is one of the most important tasks in computer vision.Visual tracking has lots of applications,for instance,visual surveillance,human computer interaction,medical imaging,and autonomous driving.Despite of the huge progress in these years,it is a challenging issue for object tracking.There are several reasons,first is that the tracking target appearance has significant variations caused by deformation,motion,illumination change and heavy occlusion.In addition,appropriate adaptive strategy is hard for tracker updating as frequent updating leads to tracker drift easily while conservative updating can not make the tracker be adaptive to target appearance changing.Meanwhile,the target size changes in many cases.In this dissertation,we explicitly address three issues for deep learning based visual object tracking:(1)Most deep learning based tracking algorithms can not handle long term tracking,especially the tracking fialure including occlusion,drift.As a result,it is important to improve the robustness of deeo learning based tracking methods.(2)The tracking speed of current algorithms is not satisfying and is hard to achieve real-time.(3)Current methos leverage simple feature combination strategy,it is meaningful to mine multiple layers features for better target appearance description.Specifically,the main contributions of this dissertation can be summarized as follows:(1)We explore region proposal for correlation filter-based tracking.Considering the longtime occlusion and target scale variation in object tracking,we leverage region proposal which has been used in object detection into object tracking and propose a method called CFRP(Correlation filter with region proposal).Specifically,we leverage pre-trained convolution neural network(CNN)for feature extraction,then fed these feature into correlation filters for object tracking.We observe that,it is hard to re-detect the target after long term occlusion.Meanwhile,it is necessary to estimate the target size timely.For these two issues,we design a two stream correlation filters with region proposal for object tracking.In the first stream,we utilize region proposal to generate a set of high quality candidates.Compared with traditional sliding windows method,the proposed region proposal method can generate more precise candidates.For another stream,as the generated candidates by region proposal have different sizes,we leverage region proposal to estimate target size.To evaluate the effectiveness of the proposed method,we test our model on three public tracking datasets.The performances prove that our model can handle the long-term occlusion and scale change effectively.(2)End-to-end object tracking with visual attention.Considering the speed of most deep trackers are slow,we propose a novel deep learning-based tracker TAAT(Transform Aware Attentive network for object tracking)to accelerate the tracking speed.We diagnose the tracking procedure of two consecutive frames as spatial translation and scale changes.Based on this observation,we model this procedure with spatial transformer network and build a Siamese network to take the reference image and search image as input,respectively.During the offline training,lots of labeled videos are used for training this network end to end,after that,we fix this network and perform object tracking directly.Due to no model updating,the proposed network achieves real-time tracking.(3)Deconvolution residual learning network for object tracking.We propose a tracking method called DSLT(Deconvolution reSidual Learning Tracking)which is designed based on residual learning to describe the target appearance.Dynamically,different layers of CNN feature have different properties,features come from lower layer contain more spatial information while features from higher levels have more semantic information.However,the high-level feature maps have smaller size which is not beneficial for precise location.We exploit deconvolution operation and add skip connection between lower layers and high layers to fuse features and obtain more robustness representation.Overall,this dissertation focuses on deep learning-based object tracking.For tracking target occlusion and scale variation,an adaptive region proposed with two-stream correlation filters method is proposed.Considering the unsatisfying speed of deep trackers,a deep attentive tracker which is build on end-to-end Siamese network is proposed.Meanwhile,to leverage multiple layers deep convolution layer features,we propose a deconvolution-based feature fusion network to describe target appearance.Extensive experiments on public tracking datasets prove the effectiveness of the proposed methods.
Keywords/Search Tags:Object tracking, convolution neural network, region proposal, correlation filter, spatial transformer network, end to end learning, deconvolution network, residual learning
PDF Full Text Request
Related items