| Object tracking,as a research hotspot in the field of computer vision,aims to explore how to accurately and quickly capture the motion state of the target of interest and achieve strong and robust target tracking.And it has a lot of theoretical significance and application value,which is widely used in many fields such as security monitoring and intelligent transportation.However,existing tracking methods mostly use convolutional neural networks to operate locally on features,lacking attention to global information and use only the last layer of features,which cannot make full use of deep and shallow information,resulting in weak discriminative feature description ability.In addition,relying solely on the features of the initial frame for template matching cannot adapt to the problem that the target continues to change in appearance during the overall track,and tracking drift can easily occur.Especially for the target object acquired from the aerial camera view,it has various complex factors such as less target information,complicated background interference,and faster motion,which makes it a significant challenge to study and design a tracking method with stable performance.Based on the actual application requirements,this thesis proposes two new solutions based on siamese network architecture and Transformer network structure to address the shortcomings of typical tracking methods.The main research contents and innovative contributions of the thesis are as follows.1)To make full use of deep and shallow information to improve feature description and to better adapt to changes in target appearance.We propose to combine Transformer feature enhancement with Template Update Phase for Object Tracking algorithm(TUTrack).First,the feature extraction network based on the siamese network architecture performs the preextraction of features.A feature enhancement network consisting of channel attention and Transformer modules are constructed to enhance the saliency of the pre-extracted feature vectors in context and on channels.Then,the enhanced feature information is used to achieve target state estimation through a categorical regression network.Finally,design a template update strategy to judgmentally and adaptively update the sample templates based on the confidence scores.The experimental results show that the proposed tracking method obtains excellent tracking performance on the typical benchmark datasets OTB100,La SOT,and GOT-10 k,especially with good robustness in complex scenarios such as target change,background interference,and motion blur.2)To obtain richer global environmental awareness and spatio-temporal information,while improving the robustness of design solutions to track tiny targets in the aviation domain.We propose a Transformer feature integration network for object tracking algorithm(TFITrack).The thesis combines the Transformer network structure and introduces a similarity calculation layer,a temporal context filtering layer,and a dual attention module in the encoder for aggregating spatio-temporal and global contextual information.The similarity calculation layer and the dual attention module deepen the similarity between features and perform channel and spatial dimension correction to improve feature representation.The temporal context filtering layer uses self-adaptive ignoring of unimportant feature information to reduce the number of calculated model parameters while ensuring tracking performance.Experimental results show that the proposed tracking method has better tracking performance on seven benchmark datasets,including OTB100,La SOT,GOT-10 k,DTB70,UAV123,UAV20 L,and UAV123@10fps,and especially shows better robustness under the influence of aerial photography challenge factors such as significant viewpoint change,low resolution,and fast target motion. |