Font Size: a A A

Research On Deep Residual Learning-Based Visual Object Tracking Algorithm

Posted on:2020-05-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:B LiuFull Text:PDF
GTID:1368330599952732Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Visual object tracking is a simulation of the biological vision system that can track a moving object.It is a key task in the field of computer vision.The core of research is to accurately estimate position,velocity,and state of the target object in continuous video image sequence.Although visual object tracking has made some progress,it is still a challenging task.It is mainly due to the significant changes in object appearance caused by occlusion,deformation,fast motion,illumination changes,background clutter,and so on.The tracking algorithms must be able to accurately identify these changes and locate the target object in each video frame.Visual object tracking task will be analyzed and discussed in this thesis.Three effective visual object tracking frameworks based on deep residual learning are established and various performance evaluation about visual object tracking algorithms are carried out on benchmark datasets.The main research contents and contributions of this thesis are summarized as follows:(1)A deep spatiotemporal residual learning and correlation filter based visual object trackingRecently,more and more research works based on visual object tracking task combine spatial and temporal features with object appearance model effectively,so that the object appearance model can better adapt to various appearance changes caused by temporal and spatial changes in video image sequences.To improve visual object tracking performance,a spatial-temporal residual network architecture and correlation filter based visual object tracking algorithm is proposed and named STResNet_CF tracker.The object appearance model is built up based on original residual network architecture and combined spatial with temporal features.And then spatial and temporal sub-networks are trained and integrated simultaneously so that we can obtain static spatial feature related to the object appearance in a single video frame and dynamic visual feature between the video continuous image sequences.In this way the spatial and temporal features can complement and benefit from each other.Finally,the deep spatial-temporal features are fed into correlation filters to achieve accuracy and robustness visual object tracking.The experimental results show that STResNet_CF tracker achieves similar or better performance compared with the other trackers.(2)A deep multi-scale spatiotemporal residual learning for robust visual object trackingFor visual object tracking task,in addition to the temporal-spatial features in video sequence that can be used to describe the appearance changes of the object,multi-scale features are also very important for exact describing of the target object in a video.When the object is moving,the scale of the object will change according to the distance from the camera.As the object gets closer to the camera,the object gets bigger.As the object moves away from the camera,the object gets smaller.It is also very important to accurately identify scale variations to improve tracking performance.In order to be able to effectively identify the scale variations of the target object in video scene,besides adding two sub-networks in original residual network for identifying the temporal-spatial features,a skip connection is added from the output of each residual unit to the next residual unit,and a multi-scale factor is added in each residual unit.The representation ability of multi-scale feature can be enhanced in residual network so that the tracking precision,accuracy,robustness,and success rate can be further improved.The improved residual network is named multi-scale spatial-temoporal residual network,and visual object tracking algorithm based on this network model is named MSST-ResNet tracker.Our tracker can robustly identify scale,shape as well as other appearance changes from the target object in a successive video image sequence,and can fully utilize the temporal sequence information related to the object motion.Finally,deep multi-scale spatial-temporal features are fed into kernelized correlation filters for accurately locating the position of the target object in each video frame.The experimental results demonstrate that our MSST-ResNet tracker can accurately and robustly track the target object in real-time,even when the appearance of the target object is changed significantly.Moreover,our MSST-ResNet tracker is superior to the existing trackers.(3)A visual object tracking method based on deep multi-scale spatiotemporal residual learning and tracking-learning-detection frameworkWe focus on three aspects of visual object tracking algorithm,that is,deep multi-scale spatiotemporal features are learned online,the detector is updated dynamically,and visual object is tracked online.A novel visual object tracking method is proposed based on deep multi-scale spatial-temporal residual network architecture and online tracking-learning-detection framework,which is named MSSTResNet-TLD tracker.Our goal is to online track,learn,and detect an interested target object given in the initial video frame in real-time.An effective method is established to continuously evaluate and update tracker,classifier,and detector based on deep multi-scale spatiotemporal residual learning.According to historical image sequence,deep multi-scale spatiotemporal features are learned.Target and its surrounding background can be effectively distinguished in each video frame for accurately detecting and tracking the target object.The experimental results show that our MSSTResNet-TLD tracker not only outperforms the state-of-the-art trackers in terms of accuracy and robustness,but also achieves real-time tracking performance on a CPU.
Keywords/Search Tags:visual object tracking, convolutional neural network, residual learning, spatial-temporal feature, correlation filter
PDF Full Text Request
Related items