Font Size: a A A

Visual Object Tracking Based On Deep Reinforcement Learning And Meta-learning

Posted on:2021-02-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y D BaiFull Text:PDF
GTID:2518306047987829Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Visual object tracking aims to locate the target specified in the first frame in a video sequence,which involves spatial changes,temporal changes and apparent changes of the target.Visual object tracking technology has important applications in many aspects in our life.This paper is based on tracking algorithms with Siamese networks and improves them from three aspects: building network structure,modeling temporal information and meta-learning parameters.1.From the perspective of building network structure,an improved Siamese network structure with two branches is proposed.Firstly,a traditional Siamese network is used to extract features from the template and searching patches,calculate cross-correlation and trained as a classification problem by normal supervised learning.Then the deeper layers is copied as a second branch,and the shallow layers of the network are obtained as sharing layers,and the sharing layer is taken as the sharing layer.The parameters of the first branch and the sharing layers are fixed,while the second branch will continue to be trained,also,a set of vectors are trained to fuse the features from the two branches.In this way,the network could maintain the genetic representation learned from huge amount of data not to be degraded by the noise of a specific video,and at the same time,it can learn adjustment of the specific video domain.Thus,the accuracy of the algorithm can be improved.2.From the point of view of modeling temporal information,this paper constructs a reinforcement learning agent which can adaptively adjust the shape of the bounding box.It takes the shape of the bounding box in the previous frame and the feature map of the template area in the current frame as the state of reinforcement learning.Three kinds of actions to adjust the shape of the bounding box are the output of the agent.The agent is trained frame by frame in the video sequence to optimize the shape of the bounding boxes while the Siamese network is used to decide the location of the target.In this way the accuracy of the algorithm is improved and thanks to the Markov process of reinforcement learning,the temporal information is modeled into the algorithm,which also provides a guarantee for stability.3.From the perspective of meta-learning parameters,the adaptive fine-tuning of the tracker in a specific video domain is interpreted as a "one-shot learning" problem.Using the idea of meta-learning,a network initialization parameter more sensitive to new tasks in the future should be obtained.More specific,the initialization of the tracker networks can quickly be adapted to robustly model a particular target in future frames.In order to achieve this,the learning step of a normal gradient descent algorithm method should be automatically optimized as well.In this way,the trained network parameters are more robust to future changes,and can be optimized more quickly in new video tasks with less back propagations.
Keywords/Search Tags:Visual Object Tracking, Siamese Network, Deep Reinforcement Learning, Meta-Learning
PDF Full Text Request
Related items