Font Size: a A A

Siamese Network-based Visual Object Tracking

Posted on:2022-08-23Degree:MasterType:Thesis
Country:ChinaCandidate:C J FangFull Text:PDF
GTID:2518306512952139Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Visual object tracking is an important research in computer vision field,with a range of applications such as autonomous driving,video surveillance,human-computer interaction,medical diagnosis,etc.Despite the fact that great progress has been made after decades of development,implementing a real-time and efficient visual object tracking algorithm is still challenging due to the change of target appearance,illumination variation,the background interference,occlusion,etc.With the rise of deep learning,neural networks have been employed in the mainstream frameworks for visual object tracking.Among them,the methods built on architecture of siamese networks have shown excellent tracking performance.By viewing tracking as a verification problem,these methods train a pair of neural networks with shared weights,construct the object template,and generate frame by frame a similarity confidence map between the search region and the object template.The region with the highest confidence score is taken as location of the tracked target.The siamese networks based tracking method have obtained better performance in terms of accuracy and real-time efficiency,and a popular trend in recent years is to improve the siamese tracking architecture.Along this line,this paper has made research on siamese networks based visual object tracking,and the main contents are as follows:(1)Existing siamese networks generally adopt feature modelling which rely solely on a single frame,and convolutional pooling operations result in the loss of structural information.To alleviate the above issue,this paper introduces the attention mechanism into the siamese framework.By designing the spatial-temporal attention module,the object appearance information at distinct time steps could be fused,so that the useful features are given more emphasis while redundant information and clutter are suppressed.The proposed model has the following advantages.First,spatial attention mechanism overcomes the limited perceptual field brought by convolutional neural networks and the loss of structural details by pooling operation.It fully exploits structural dependencies among feature points in spatial domain,thus mitigating the performance decreasing due to target appearance variation and occlusions in complex scenes.Second,temporal attention mechanism can model inter-frame dependencies among target,which enriches the diversity of appearance features and enhances the robustness of representation learning.The experimental results show that the improved method is helpful for promotion of the tracking accuracy.(2)In scenes where interference by similar targets occurs frequently,the attention mechanism based on visual consistency will probably lead to tracking drift.A siamese single object tracking method is presented to incorporate appearance with motion feature.An appearance alignment model is designed based on the estimation of flow motion,by which motion information is introduced into the siamese network.The experimental results demonstrate that the fusion of flow and spatial-temporal attention models results in complementary between motion information and visual features,which alleviates the interference problem by similar targets and improves the tracking performance.
Keywords/Search Tags:Visual object tracking, deep learning, Siamese networks, attention mechanism, optical flow
PDF Full Text Request
Related items