Font Size: a A A

Research On Visual Tracking Based On The Fusion Of Convolutional Features And Temporal Information

Posted on:2020-11-28Degree:MasterType:Thesis
Country:ChinaCandidate:D LiFull Text:PDF
GTID:2518306308961479Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Computer vision is an important research direction of artificial intelligence.As a significant branch of computer vision,visual tracking has received much attention from domestic and foreign scholars.Correlation filter has been used in traditional algorithms,which takes handcraft features to model the target.Due to handcraft features can't essentially describe the target,the performance of this kind of algorithms is restricted.Deep learning technology combines low-level features into more abstract high-level features so that the above problems can be solved.Most visual tracking algorithms based on deep learning use convolutional neural networks to solve this problem.Firstly,the feature extraction is performed on the target by using CNN,and then the features of last convolutional layer are used to locate the target by regression or classification approach.Although such methods have achieved good performance,they still have some limitations,such as ignore the correlation between video frames and fail to make full use of the convolutional features of different layers.This thesis proposes a tracking model by fusing convolutional features and temporal information(FCFTI)to improve the performance of visual tracking.The work of this thesis is outlined as follows:(1)Based on the ADNet model,which only use the features of the last convolutional layer to describe the target,an improved method by fusing convolutional features(FCF)is proposed,which fuses shallow features and deep features.An improved method by fusing the spatial-temporal information(FSTI)is proposed to solve the problem that the ADNet model can't mine the temporal correlation bet,ween video frames.The recurrent neural network is integrated into the model.These target features of current frame are fused with the target features of historical frame through recurrent neural network as final representation of the target.Most tracking algorithms only use spatial features extracted by the last convolutional layer to characterize the target,those methods neither take full use of the features of different convolutional layers nor mine the temporal correlation between video frames.This thesis proposes a tracking model by fusing convolutional features and temporal information(FCFTI).(2)Implement several groups of comparative experiments on OTB datasets.First of all,ADNet model is selected as baseline.Secondary,investigate the tracking performance of the improved methods proposed by this thesis and the baseline respectively.Then,the FCFTI model is compared with several well-recognized algorithms with good performance.Experimental results show that the tracking performance can be improved by combining the features of each convolution layer and integrating spatial-temporal information,respectively.Meanwhile,the model proposed in this thesis is better than other comparison models.
Keywords/Search Tags:computer vision, visual tracking, convolutional neural network, recurrent neural network, reinforcement learning
PDF Full Text Request
Related items