Font Size: a A A

Research On Video Object Tracking Algorithm Based On Deep Learning

Posted on:2021-01-29Degree:MasterType:Thesis
Country:ChinaCandidate:X J WangFull Text:PDF
GTID:2428330614971397Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the rapid growth of artificial intelligence,video object tracking has become one hot research topics in the field of computer vision,which has a wide range of applications,such as intelligent video surveillance,intelligent transportation,intelligent human-computer interaction,intelligent medical diagnosis and so on.In recent years,the field of video object tracking has continuously developed,and single-object tracking algorithms based on deep learning have emerged one after another.However,due to the frequent changes in the target and the surrounding environment during the tracking stage,how to accurately and robustly track the object in complex scenes is still extremely challenging.Based on the tracking framework of siamese network,this dissertation proposes three single-object tracking algorithms from the aspects of model training,network structure and feature fusion.The main contributions are summaried as follows.(1)A visual tracking algorithm based on discriminative context-aware correlation filter network is proposed.The existing tracking algorithms combining siamese network with correlation filter make use of a cosine window to reduce the boundary effect,which results in the reduction of the context information around the target,and may limit the accuracy of the tracking algorithm.Firstly,based on the tracking framework of siamese network,this dissertation redefines the loss function of the context-aware correlation filter and interprets it as a special layer of the convolutional network,which is combined with the siamese network in order to enable the entire network end-to-end trained.Then,the channel attention module is embedded in template branch of the entire network,which is used to select semantic attributes for the appearance change of the target.Finally,this dissertation proposes a high-confidence update strategy based on the combination of the average peak-to-correlation energy and the peak value about the response map,which decides whether to update the model.(2)A visual tracking algorithm based on deep siamese network is proposed.The existing tracking algorithms based on siamese network make use of shallow network Alex Net for feature extraction.This way does not take full advantage of the deep convolutional network and can not deal with the complex changes of the target itself and its surroundings,which may limit the robustness of the tracking algorithm.Firstly,based on the tracking framework of siamese network,the original Alex Net is replaced with the modified VGG-16 network to track the target robustly in this dissertation.Then,in order to improve the ability of the tracking algorithm to deal with background clutter,the spatial mask is used to suppress the background of the target,which suppresses uncorrelated background noise at the image level and the feature level.Finally,in order to improve the tracking speed of the deep siamese network,only the deep siamese network is used to estimate the position information,and on this basis,the discriminant correlation filter network is used to estimate the scale information.(3)A visual tracking algorithm based on improved twofold siamese network is proposed.The existing tracking algorithms based on siamese network use shallow convolutional features or deep convolutional features for object tracking.Single level convolutional features cannot simultaneously account for accuracy and robustness,which limits the tracking performance in complex scenes.In this dissertation,in order to form the twofold siamese network,the context-aware correlation filter network is used as the appearance branch,and the deep siamese network is used as the semantic branch.The shallow convolutional network and the deep convolutional network are used to extract the spatial structure information and high-level semantic information of the target,respectively.Then,the average peak-of-correlation energy is used to calculate the weight value of the appearance branch and the semantic branch,and their response map are adaptively fused.Finally,in order to cope with background clutter,the multi-domain convolutional neural network is used to detect the location of multiple peaks in the fused response map.
Keywords/Search Tags:Object tracking, Deep learning, Correlation filter, Siamese network, Feature fusion
PDF Full Text Request
Related items