Font Size: a A A

Deep Attention Object Tracking Network And Embedded Deployment

Posted on:2022-05-15Degree:MasterType:Thesis
Country:ChinaCandidate:W P HuFull Text:PDF
GTID:2518306602494074Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Visual object tracking is one of the core problems of computer vision.The discriminative deep learning method based on siamese network has achieved significant performance in the field of single object tracking.However in order to realize fast tracking,the mainstream trackers generally adopt the conventional convolutional backbone networks,thus resulting in an inaccurate tracking.In order to improve the tracking accuracy,this paper introduces the attention mechanism to construct the self-attention modules and the spatial attention modules to extract more discriminant object feature.These features can improve the performance of the single object tracking model without significantly increasing the computation complexity.In addition,considering that the embedding feature based on the center point can't fully represent the object instance in multi-object tracking,an attention-based object embedding model is designed.Based on these works,the embedded deployment of the attentional object tracking network is studied.The main work and innovation of the paper are as follows:1.Object Tracking with Transformer(OTTR)is designed for single object tracking.Firstly,Transformer,a self-attention model,Transformer,is added to the deep object tracking network to accurately locate and estimate the size of the object.Secondly,an Unilateral Spatial Attention(USA)module that is guided by the higher level semantics,is designed to improve the localization accuracy of the objects.At the same time,it can avoid large number of parameters caused by the structure of FPN(Feature Pyramid Network).The method was verified on multiple single object tracking datasets.The experimental results show that the AUC scores of OTTR are 0.654,0.448 and 0.585 on OTB100 low-resolution benchmark,GOT-10 K dataset and UAV123 dataset,respectively,and the tracking accuracy of OTTR on UAV123 dataset is higher than that of Siam RPN++ tracker 3.7%.The AUC of Tracker Based on Unilateral Spatial Attention(TBUSA)is 1.1% higher than that of Di MP single object Tracker on GOT-10 K dataset.On the UAV123 dataset,has an improvement of 0.6% about AUC over Di MP.2.A Crosshair Embedding Multi Object Tracker(CEMOT)based on USA is designed.Firstly,the unilateral attention module is applied to the multi-object tracking model,and the high level features were integrated into the low level features without additional computation.Compared with Fair MOT,the constructed network has an improvement of0.2% on the MOT16 benchmark dataset and 0.1% on the MOT17 benchmark dataset.Secondly,the multi-object tracking model based on the central point detector only uses the embedded features at the central point for data association,which leads to the defect that the embedded features can not fully express the object instance,and a five-point crossing-star embedded feature data association method is designed.In the training stage,the constraints include the five-point loss of the center point and the surrounding points.In the tracking stage,the embedding feature at the object point is used to conduct the data association between the detection object set and the tracking fragment set.In the MOT16-val benchmark,MOTA is improved by 0.1% compared with CEMOT baseline,and in the MOT20-val benchmark,MOTA has an improvement of 0.1% over CEMOT-baseline.3.Two USA-based single object tracking models,MDNet and Siam FC,are realized on FPGA(Field Programmable Gate Array).Firstly,the MDNet and Siam FC single object tracking models based on the USA are designed using Tensor Flow framework,and the models were trained on the GPU side.Secondly,based on VCU1525 platform,FPGA is used to accelerate the backbone network(including convolution,pooling,activation function and other modules)in the models.The running speed of MDNet network on FPGA is 37.69% faster than that of on GTX1060(GPU),and two times faster than that on CPU,without loss of object tracking accuracy.
Keywords/Search Tags:Deep Learning, Single Object Tracking, Multiple Object Tracking, Self-Attention, Unilateral Spatial Attention
PDF Full Text Request
Related items