For a long time,visual object tracking technology has been a hot research point in the field of computer vision,which is widely used in video monitoring,driving assistance,human-computer interaction and other scenes.With the advancement of national projects like "Safe City","3111 Project",demand for security monitoring equipment has increased dramatically as the amount of video data has also increased exponentially.How to fully explore the information in video becomes the focus of research,and visual object tracking technology is one of the key technologies.In the upsurge of research on automatic/assisted driving technology,target tracking technology is also of vital importance.However,due to the complexity of real-life scenes,many interference situations,and high real-time performance of the tracking algorithm,the traditional object tracking algorithm has not been commercially used on a large scale.In recent years,as convolutional neural network has made breakthroughs in other research directions of computer vision,more and more researchers try to apply convolutional neural network to object tracking algorithms.At present,there are two main problems on this kind of object tracking method based on convolutional neural network: first,it is difficult to combine high speed with high performance;second,there is a lack of appropriate training methods.For resolving the above problems,this paper proposes a multi-scale self-updating single-object tracking algorithm based on siamese networks.The main work of this paper is as follows:(1)According to the characteristics of convolutional neural network,backbone network with better performance is used to extract features,and convolutional features at different network levels are fused to strengthen the ability of feature representation;(2)Feature pyramid is selected to replace image pyramid,which enhances the performance of the algorithm when the target scale changes rapidly;Sparse searching strategy based on Region Proposal and position regression network are also embedded into the convolutional neural network.These two methods increase the accuracy of location and reduce the computational redundancy caused by sliding window searching strategy.(3)For adaptive intermittent updating of template images,a tracking result’squality evaluation index is proposed.This strategy can not only help the model to learn the change of targets,but also avoid the computational burden caused by the update.(4)The extraction method of input image patches before network forward propagation is modified to reduce the impact of target deformation and excessive background on algorithm accuracy.At the same time,the search area is expanded to increase the tracking ability for small targets.(5)For solving the problem of missing training sets and training methods,a large number of training image pairs are extracted to construct training sets to assist model training.As shown in experiments,the proposed algorithm has higher accuracy than the baseline algorithm SiameseFC,whose success plot rate increases from 0.612 to 0.687,precision increases from 0.815 to 0.894 in OTB2013,and its speed on GPU can still keep in real-time(27 fps).In addition,the tracking success rate is obviously improved under the interference of similar objects or complex background. |