With the rapid development of computer hardware and artificial intelligence technology,deep learning algorithms in fields such as computer vision and natural language processing have become one of the hot research directions in recent years.Among them,object tracking algorithm belongs to the field of computer vision,which is widely used in military operations,unmanned driving,intelligent video surveillance and other industries due to its strong practicality,scientific research value and commercial value.Siamese neural network object tracking algorithm has become the mainstream research direction in the field of object tracking in recent years due to its advantages of good robustness and strong real-time performance.However,there are still many problems in the existing Siamese neural network object tracking algorithms.Firstly,most of the object tracking algorithms do not update templates or only use simple multi-template fusion with linear weighting and convolution,which cannot adapt to the changes in appearance of tracking targets over time.Secondly,Siamese neural network object tracking algorithms use simple deep convolution to complete the fusion of search region features and template features,which is limited by the receptive field and difficult to establish global connections.In order to solve the above problems,this thesis analyzes the current situation of object tracking algorithm and proposes two effective object tracking algorithms,which improve the performance of the algorithm through template update network and inter-layer feature fusion network.The main contributions and innovations of this thesis are as follows:(1)A Siamese neural network object tracking algorithm was proposed in this study,which integrates temporal information from multiple templates.Conventional siamese neural network object tracking algorithms only use the initial template selected by the user in the first frame of the video,which remains fixed throughout the tracking process.As the appearance and shape of the target change dynamically during tracking,the performance of the algorithm suffers a significant decline.In order to solve the above problems,this study was inspired by the Transformer structure and designed a Multi-Template Fusion Module which integrates multiple templates in real time during the tracking task,and updates the templates.Additionally,a training method was developed specifically for the template updating module,which utilizes temporal information to form training samples for the model,resulting in improved algorithm performance while maintaining a certain level of real-time capability.(2)A layer-wise feature fusion object tracking algorithm based on attention mechanism is proposed.In current siamese neural network object tracking algorithms,depth-wise convolution is typically used to fuse the features of the search region and the template,which are then fed to a localization network to predict the center position and shape information of the target.However,depth-wise convolution are limited by the size of the receptive field,making it difficult to establish a global connection between the template and the search region and causing loss of finegrained features after fusion.To address this issue,this study proposes a layer-wise feature fusion network based on the attention mechanism,which enhances the shallow detail features and deep semantic features of the search region and target template by selecting and fusing them using attention.This method not only enriches the feature details but also overcomes the limitations of the receptive field.Furthermore,an online and offline tracking fusion localization network is proposed,which improves the accuracy of the final prediction results through the online tracking branch.In this thesis,extensive comparative experiments and ablation studies were conducted on commonly used datasets in the field of object tracking,including OTB100,LaSOT,UAV123,VOT2016,and GOT-10k,to evaluate the proposed object tracking algorithms against current state-of-the-art methods.The results verified the effectiveness of the two proposed object tracking algorithms and demonstrated that attention mechanism and Transformer structure can achieve better performance than convolutional networks in template updating and feature fusion tasks by establishing global connections on features. |