Visual object tracking is one of the fundamental computer vision tasks,which aims at acquiring the states,including locations,sizes and etc.,of the specific target in video sequences.It has many potential applications,such as in auto-driving monitoring the surroundings in real-time,in surveillance video analyzing auto tracking suspicious ob-jects and in video editing assisting in cropping out interested regions.Recent decades have witnessed the success of the tracking research community.However,for those complicated test sequences,existing tracking algorithms do not work as expected.For example,when the target's size drastically changed,the bounding boxes returned by the trackers could not precisely enclose the real object.Besides,when the test sequences are a little longer,trackers may drift from the target because of model corruption caused by continuous updating.Those two problems,i.e.,imprecise bounding boxes and subopti-mal model update,will severely degrade the tracking performance,therefore this thesis mainly focuses on them,and provides some alternative solutions.In order to solve the problem of imprecise bounding boxes,this thesis proposes to take a binary mask as the output of CNN for tracking,and based on that mask estimate a rectangular with multi-degree of freedom as a relatively precise tracking result.Dur-ing the generation of training samples which are used to online fine-tune the parameters of the network,a Crop and Paste method is employed to fully utilize context informa-tion,a random value is added to the lightness components of training samples to mimic the illumination change,and a Gaussian filtering approach is taken to mimic the blur.During the tracking,a bounding box approximation method which utilizes the temporal consistency among adjacent video frames,is proposed.The bounding boxes estimated by the proposed method have five free parameters,which are two more than those ap-proximated by previous works.Therefore the tracking results in this thesis are preciser and experiments show that the proposed method achieves state-of-the-art performance among real-time trackers.In addition,to deal with the suboptimal model update problem,a minimization formula in some sense performing optimal model update is proposed.In this objective,there exists two challenges.The first is that the newly generated target model is unre-liable.To overcome this problem,the objective imposes a penalty to limit the distance between the learned target model and the last one.The second is that as time evolves,the status,i.e.,corrupted or not,of the last target model can not be determined.To get out of this dilemma,the objective utilizes a reinitialization term.Besides,to control the complexity of the transformation matrix,a regularizer is also added.Finally,the optimization formula's solution,with some simplifications,degenerates to exponential moving avenge(EMA),which indicates that the method in this thesis can be viewed as an extension of EMA.Finally,experiments conducted on several common bench-marks demonstrate the effectiveness of the proposed approach in relatively long term scenarios..In summary,to solve the imprecise bounding boxes problem,this thesis proposes to employ a mask as the network's output,and based on that approximates a multi-degree of freedom rectangular as the tracking result.Besides,to deal with suboptimal model update problem in relatively long-term scenarios,a minimization objective with regularization and reinitiation terms is proposed. |