Font Size: a A A

Algorithm Study On Object Tracking Via Language And Visual Model

Posted on:2020-08-24Degree:MasterType:Thesis
Country:ChinaCandidate:C X LiFull Text:PDF
GTID:2428330590463148Subject:Engineering
Abstract/Summary:PDF Full Text Request
As one of the most basic visual abilities of human beings,visual object tracking has always been the focus of attention for academia and industry.And the visual object tracking has been successfully applied in Monitoring,human-computer interaction,assisted driving and other fields.Nevertheless,how to make the object tracking technology reach large-scale application and industry standards in terms of actual performance indicators such as accuracy,robustness and real-time performance is still a highly open problem.In particular,the existing trackers are suffering some problems including the roughening of the target apparent modeling method,the insufficient fusion of multi-modal information,big data dependence and so on.Recently,the trackers based Siamese network have drawn increasing interest in visual tracking.It achieves a good balance between accuracy and speed.However,most of them suffer from significant appearance variations and similar distractors.Because they mainly focus on offline constructing a matching network without online updating and only the first frame target feature is used as the only clue for target search.To address this problem,we propose two algorithms.One is a novel hierarchical tracking method?named Hi-Tracker?via adaptively fusing Siamese features and another is a multi-branch Siamese tracking algorithm based on semantic modeling and appearance modeling?named SegA-Siam?.?1?Our Hi-Tracker integrates the discriminative correlation filters into the Siamese matching network via an end-to-end training manner to improve the discriminative power of each feature layer.Then,based on an analysis of a simple yet effective online motion model and the peak-versus-noise ratio?PNR?of the response maps,our Hi-Tracker incorporates a fast transformation learning model into the network to capture target appearance variations and improve its robustness to similar distractors,respectively.Finally,our Hi-Tracker fuses a variety of the network outputs from complementary Siamese features to estimate the optimal target state.Experimental results on OTB2013[1]and OTB2015[2]shows that our Hi-Tracker cannot only achieves a competitive performance among other state-of-the-art trackers,but also runs at a real-time speed of 25 FPS on the GPU.?2?In SegA-Siam,we use natural language to locate the target coarsely,and then use visual features to fine search the location of the target.Specially,in order to improve the discriminant ability,we use Long Short Term Memory Network?LSTM?[3]to model the appearance.SegA-Siam consists of two branches,both of which are Siamese networks.One branch uses natural language to understand semantic of candidate region and another branch uses bidirectional LSTM to build a robust model for appearance of target.The branch of semantic understanding which has the similar structure with the SiamFC[4]is used to classify the foreground and background of the candidate region and get a binary segmentation mask.In the appearance modeling branch,the Bidirectional LSTM is used to process the depth features.And the depth features are input into the network from left to right according to the width of the features.The association during the object features can be enhanced to improve the discriminating ability.The two branches are not combined at training time until the test time.Two response maps are weighted fusion as the final response map to determine the position of the target.Through observation,the peaks in the response map are mainly concentrated near the target,and the highest peak position is not exact the target position.Therefore,multiple peaks are selected.Each peak corresponds to an exact target box.The overlaps between each candidate target box in current frame and target box in previous computed.The final target frame is judged by combining the peak values and the overlap rate.
Keywords/Search Tags:Visual Object Tracking, Deep Learning, Siamese Neural Network, Natural Language, Correlation Filter
PDF Full Text Request
Related items