Font Size: a A A

Research On Visual Tracking Based On Deep Learning

Posted on:2019-11-20Degree:MasterType:Thesis
Country:ChinaCandidate:L X LiFull Text:PDF
GTID:2428330566496850Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Visual tracking is an important branch in Computer Vision,which attracts attentions from many scholars.It has been applied to many scenes,intelligent video surveillance,automatic transmission,human-computer interaction,etc.The taks of visual tracking is estimating the state of one arbitrary target in video sequences based on the groundtruth given at the first frame.While elements still exist to influence the performance of the tracking algorithm,such as scale variation,fast motion,occlusions and deformation,that make it hard to track.It is necessary to make further research in visual tracking.Correlation Filter has been widely used in visual tracking,which takes handcraft visual features to model the target.Despite its computations in frequency domain boost the tracking speed,the features extracted by hand or from shallow network almost take the precision to the top.Deep learning exhibits its advantages in visual tracking with its application in computer vision.Many deep models,convolutional neural network(CNN),residual network etc.,extract richer and more accurate features to enhance the capabilities and robustness of tracking algorithms.In this paper,visual tracking based on deep learning is mainly discussed,and an endto-end framework adopts CNNS is constructed to calculate the state of the target in video.Further more,a research on combining correlation filter with deep features is made,it uses siamese network and residual network to learn and transfer features from images.The model in the end-to-end framework consists of three convolution layers and three fully-connected layers,and two methods are applied to optimize,one is appending a spatial pyramid pooling net(SPPNet)to the model,which makes it possible to handle multi-scale input,thus,the algorithm could adapt to scale variation;the other is combining the appearance features from shallow convolution layers and the semantic features from deeper layers to model the target,which distinguish the target and the background better.For the further research based on siamese network and correlation filter,several attention mechanisms are applied to encode different weights for different featrue maps,which makes the algorithm adapt to specific tracking scenes better.The paper constructed two frames mentioned above and optimize them as recommended.Compare with the baseline,the tracking performance and robustness make a promotion while keep the same speed.
Keywords/Search Tags:visual tracking, deep learning, CNN, siamese network
PDF Full Text Request
Related items