Font Size: a A A

Research On Single Target Visual Tracking Algorithm Based On Convolutional Neural Network

Posted on:2022-07-30Degree:MasterType:Thesis
Country:ChinaCandidate:S W LiFull Text:PDF
GTID:2518306329977189Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Visual tracking research is currently one of the most important computer vision research directions,and its application fields include visual navigation,intelligent video surveillance,smart cities and military precision guidance,etc.Tracking a target in actual application scenarios will be affected by deformation,light intensity,rotation and many other factors,which makes single target tracking still a very challenging task.In recent years,convolutional neural networks have been successfully applied to various image processing problems(such as target detection,object classification,image segmentation,etc.),which has significantly improved the performance of corresponding tasks.This thesis conducts an in-depth study on how to reasonably use convolutional neural networks to improve the performance of single target trackers,and the main work is from two aspects as follows:(1)Feature maps of different depths in convolutional neural networks(CNN)show different characteristics:shallow feature maps retain more spatial geometric features about the target,so this type of feature is more sensitive to changes in target state.The deep feature maps retain the robust semantic information of the target,so when the target undergoes geometric deformation or color changes,the deep features are not easily affected by it.In order to better enhance the representation ability of different levels of features,this thesis introduces the self-attention mechanism into the multi-domain convolutional neural network(MDNet)and designs SAMDNet.The self-attention mechanism introduced in this thesis is specifically implemented by the spatial attention module and the channel attention module.The spatial attention module selectively aggregates the weighted sum of features in all positions to the corresponding positions in the original feature maps,which enhances the degree of association between similar features.After the channel attention module integrates all feature maps,the channel space is weighted to selectively emphasize the importance of each channel feature.In part of the training data of MDNet,there are situations where the semantics of the targets are the same but the categories are different,so the discriminative ability of the network model is reduced.In order to solve this problem,a composite loss function is constructed.The composite loss function is composed of a classification loss function and an instance discrimination loss function.The classification loss function counts the loss value of the target classification,and the instance discrimination loss function increases the weight of the target in the current video sequence and suppresses its weight in other sequences.(2)In MDNet based on CNN framework,since several down-sampling structures are used to obtain the depth features of the target when extracting features,more detailed information is lost in the down-sampling process.The feature maps of different depths of the target extracted by CNN,especially the shallow feature maps contain more spatial location information of the target object,and this information is essential for target tracking tasks that require precise positioning.For deep features,the target's deep feature maps covers robust semantic information about the target.This information makes the discriminative ability of the network model less susceptible to being affected in some tracking scenes including changes in light intensity,object deformation and rotation.Therefore,in view of the problem that the tracking network cannot make full use of different levels of features,this thesis uses MDNet as the baseline algorithm and designs a multi-domain convolutional neural network based on multi-level feature aggregation(FAMDNet).By aggregating features of different levels,the tracker can make full use of the features of different levels,which significantly improves the feature representation ability of the tracking network.The feature aggregation network designed in this thesis is still an end-to-end model without introducing specific parameter settings,thus maintaining the universality of the target tracker.This thesis also designs a specific data augmentation strategy for network pre-training,which fully takes into account the various situations that may occur in the target tracking scene.The training samples are processed by simulating changes in the target state in advance,so that the trained network model can have a stronger discriminative ability.In addition,this thesis embeds an anomaly detection module in the tracking algorithm,and designs a complete anomaly response strategy to improve the problem of template drift caused by dramatic changes in the target state or long-term tracking.The algorithms designed in this thesis have carried out detailed quantitative and qualitative experiments on the currently widely used OTB and VOT2015 datasets.The experiments show that our trackers have good tracking performance and are superior to many top-performing algorithms,which is fully verified the effectiveness of the designed algorithms.
Keywords/Search Tags:Computer vision, Object tracking, Convolutional neural network, Self-attention mechanism, Feature aggregation
PDF Full Text Request
Related items