| Visual object tracking is an important and active research field in computer vision,which enjoys a wide range of applications such as intelligent video surveillance,humancomputer interaction,virtual reality,visual navigation and medical imaging.Given the initial position and size of a target object in the first image,the goal of tracking is to estimate the position and size of the target in the subsequent frames.Many visual trackers have been proposed in the literature and have achieved good tracking accuracy in relatively simple environment,such as the tracking of rigid object in static scenes.In fact,the target object is usually in a complex scenario with one or more interference factors including occlusion,deformation,in-plane rotation,out-of-plane rotation,scale variation,fast motion,motion blur,illumination variation and background clutter,which increases the difficulty of modeling the appearance of the target,making it still a challenging problem to track the target accurately,efficiently and robustly.In recent years,researches have been carried out on visual object tracking in complex scenarios,and corresponding trackers have been proposed.Based on the analysis of existing visual trackers,this dissertation carries out in-depth researches on visual object tracking in complex scenarios with correlation filter and deep learning.The main work of this dissertation is as follows:1.Combining correlation filter and the idea of parts matching,this dissertation proposes a kernelized correlation filters-based collaborative tracker that processes both holistic and reliable local features,abbreviated as KCF-HR tracker.Firstly,a confidence metric is designed to measure the confidence of the correlation filter response map of an image.Based on the proposed confidence metric,a local model is proposed for the fusion of multiple reliable parts,reducing the influence of surrounding background and other interference information.Then,a global model is proposed by combining global information and global-local interaction information to estimate the position and size of the target object.In the global model,a resetting unreliable parts method is proposed to ensure the number and reliability of reliable parts.The KCF-HR tracker is finally obtained by combining the above-mentioned local model and global model.The experimental results demonstrate that the KCF-HR tracker can effectively improve the tracking performance under interference factors such as occlusion,deformation and rotation.2.This dissertation proposes a correlation filters-based scale adaptive(CFSA)visual tracker.Firstly,an improved EdgeBoxes-based proposal generation method is proposed to generate high-quality candidate object proposals for correlation filter tracking,which can reduce the possibility of tracking failures that are caused by the tracked target reaching the boundary of the search region,or the partial or entire tracked target having moved out of the search region.Then,an object detection based scale estimation method is proposed to estimate the scale of the target object more accurate and efficient.Combining the above two methods,the CFSA tracker is finally proposed.The experimental results show that the proposed tracker outperforms several state-of-the-art trackers in the case of fast motion or scale variation,while operating at about 19.5 frames per second.3.Combining the correlation filter and convolutional neural network,this dissertation proposes a correlation filter network based visual tracker with adaptive weighted multi-layer CNN features,abbreviated as AWMF-CFNet tracker.Firstly,a multi-scale CNN features extraction network is proposed,which can simultaneously capture the high-level semantic features and the low-level spatial features.Then,an adaptive weighted features integration network consisting of a holistic-part network,a spatial attention network and a channel attention network is proposed,which can overcome the deficiency that all channels or regions have the same weight in the feature maps of existing visual trackers,and enhance the appearance representation ability of feature maps.Based on the above-mentioned features extraction network and features integration network,the AWMF-CFNet tracker is proposed.The experimental results demonstrate that the AWMF-CFNet tracker achieves favorable performance against several state-of-the-art trackers when the target is in challenging scenarios such as motion blur and illumination change.4.In order to deal with the problem that most existing CNN-based trackers have low time efficiency due to online updates of their network parameters,a region-based Siamese network(RSNet)tracker is proposed,in which visual object tracking is reformulated as a similarity measurement problem.The RSNet tracker can accurately track the target object with only once offline learning.Firstly,a multi-scale CNN features fusion network is proposed,which can perform repeated information exchange and fusion across multiresolution subnetworks to obtain feature maps that fully integrate the deep semantic information and the shallow spatial information.Then,a part-based CNN features integration network is proposed,which can generate a set of region-sensitive feature maps to improve the appearance representation ability of feature maps.The proposed multiscale CNN features fusion network and part-based CNN features integration network are integrated into a Siamese network to form the proposed RSNet tracker.The experimental results show that the RSNet tracker exhibits comparable tracking performance to several state-of-the-art CNN-based trackers under a variety of interference factors,and its time efficiency is significantly improved. |