Font Size: a A A

Siamese Network-based Researches For Visual Tracking

Posted on:2021-04-25Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhangFull Text:PDF
GTID:2428330626460394Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Visual tracking is an important research direction in the field of computer vision.It has profound research significance and far-reaching application prospects in video surveillance,intelligent transportation,visual navigation,and military guidance.Target tracking is the process of achieving continuous state estimation of a specific target in a continuous video sequence as the background changes continuously.However,the scene is complex and diverse,and the goal is constantly changing.There are still great challenges to achieving stable tracking in practical problems.The focus of the visual tracking problem is tracking speed and accuracy.In recent years,the target tracking algorithms based on the siamese network have received extensive attention from researchers due to their superior tracking performance.The kind of method transforms the target tracking problem into patches matching.The location of the target is determined by training a classifier to calculate the similarity between the template and the search area.The tracking problem is greatly simplified by end-to-end offline training.This paper studies and improves the network structure based on the target tracking algorithm of the fully convolution siamese network.Aiming at the fast motion of the target,this paper presents an Embedded Optical Flow Siamese Network for real-time visual tracking.The model structure includes optical flow positioning network and visual tracking network.First,the optical flow information between adjacent frames is calculated by the optical flow network.Second,a regression module based on optical flow information is trained to get the precise position of the target in the next frame.Finally,the location information is used to crop out a fixed-size search area in the current frame and send it to the tracking network later.The first input of the tracking network is the determined target given in the first frame,and the second input is the search area obtained by the optical flow positioning network.The tracking network extracts features from these two inputs and then measures the similarity to achieve tracking.Experiments show that this method helps to more accurately locate the search area of the current frame more accurately.It can significantly improve the accuracy and success rate of the original backbone network,and avoid the tracking drift problem effectively.This paper proposes an independent structure of attention module in order to solve the problem that the shallow network cannot effectively express the picture information.The module improves the distinction between target foreground and background by enhancing the target foreground features and suppressing the semantic background information.We connect the attention module to the backbone network in the form of a long jump connection.The convolution layer uses the Inception network structure,which makes the model more capable of self-decision making.The experimental results show that the introduction of attention mechanism can improve the discriminating ability between the target foreground and the semantic background,thus significantly improve the accuracy and robustness of the algorithm.
Keywords/Search Tags:Visual Tracking, Siamese network, Optical flow estimation, Attention mechanism
PDF Full Text Request
Related items