Font Size: a A A

Research On Visual Object Tracking Algorithm Based On Deep Siamese Network

Posted on:2022-10-01Degree:MasterType:Thesis
Country:ChinaCandidate:J M ZhuFull Text:PDF
GTID:2518306533477394Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Visual object tracking,as one of the important research directions of computer vision,is widely used in many fields,such as intelligent monitor,man-machine interaction and UAV control.Due to the complexity of object tracking application scenarios,there are many disturbances come from background environment or target itself,including deformation,illumination variation and occlusion.Aiming at various challenges of object tracking task,researchers have proposed many different object tracking algorithms.Siam FC(Siamese Fully-Convolutional)algorithm achieves good balance between tracking precision and speed,so large amounts of derivation algorithms based on Siam FC are generated,among which the representative ones are Siam RPN(Siamese Region Proposal Network)and Siam RPN++.To further improve the performance of existing Siamese trackers,we conduct research on feature extraction and training samples in this paper.The main work contents are as follows:1)Enhance the feature expression of the object.The initialization data of single object tracking(SOT)is just the object bounding box labeled in the initial frame.Because our research area is universal object tracking,it is impossible to train the feature extraction network by using the prior knowledge of human.The appearance of the target may constantly change while tracking,so it is necessary to make full use of the appearance information of the target in the initial frame and extract robust feature expression to further improve the discriminability of the tracker.Based on the above considerations,the improved Res Net-50 is used as the feature extraction network in this paper.The improvement includes removing the network padding,reducing the network stride and using dilated convolution,which not only increases the receptive field of the shallow convolution layer,but also makes the dimension of the feature maps extracted from the last three residual blocks consistent.Therefore,the feature maps can be fused to enhance the location information and semantic information in the feature.Besides,although the appearance of the target may change constantly in the tracking process,each part of the target always has a certain relationship.So,the graph convolutional network(GCN)is used to perform relational reasoning among different parts of the target,so as to improve the tracker's discriminability for the deformation of the target.2)The training samples are balanced.There is only one target in an image for SOT task.The way existing research divides training samples leads to a problem for SOT,which is the imbalance between positive and negative samples,as well as hard and easy samples.Easy negative samples will dominate the training process of the model,which makes it is difficult to further improve the performance of the tracker.Different from OHEM(Online Hard Example Mining)algorithm which completely uses hard samples to train the network,we propose Hard Sample Loss based on standard Cross Entropy Loss.It balances positive and negative samples as well as hard and easy samples at the same time,thereby solving the problem that the poor discriminability of OHEM algorithm for easy samples.Besides,the balance between positive and negative samples as well as hard and easy samples can be adjusted by two parameters,so that the loss can be conveniently added to different tracking algorithms.The models proposed in this paper have been tested on OTB2015,VOT2016,VOT2018,UAV123 and La SOT datasets.Experimental results show that compared with the mainstream tracking models,the performance of our tracker is significantly improved.Besides,a large number of ablation experiments and visualization experiments are performed to ensure the effectiveness and necessity of each parameter and implementation method of the model.
Keywords/Search Tags:Visual Object Tracking, Siamese Network, Graph Convolutional Network, Sample Balance
PDF Full Text Request
Related items