Font Size: a A A

Research On Visual Obiect Tracking In Complex Scene

Posted on:2023-09-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:J F ZhuangFull Text:PDF
GTID:1528306914476454Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Visual object tracking plays a vital role in computer vision with various applications,such as intelligent video surveillance,intelligent transportation,automatic driving,and computer-aided medicine.Object tracking aims to predict the location of a target in each subsequent frame,given its location in the first frame.Despite remarkable progress in recent years,visual tracking still faces challenges due to occlusion,scale variation,background clutters,fast motion,illumination variation,and appearance variations.Further improving the target tracking accuracy in complex scenes and ensuring the real-time speed is still the focus and difficulty of the current target tracking territory.The study takes object tracking in complex scenes as the primary research line.We conduct relevant research with four aspects:receptive field of a deep neural network,hard example mining,knowledge distillation,and Transformer design.The proposed algorithm’s effectiveness is verified comprehensively through extensive experiments on public datasets.Specifically,the research contents and main contributions are summarized as follows:(1)The receptive fields of deep convolution neural networks are manually designed is one of the restrictions for visual tracking performance.We propose an Auto-Selecting Receptive Field(ASRF)network in this thesis.ASRF is based on a Siamese-network with two novel modules,a Selective Receptive Field Block(SRFB)and a Multi-Scale Receptive Field(MSRF)module.SRFB is designed to adaptively adjust receptive field size for each neuron according to multiple scales of input information.MSRF marks a further step in selecting helpful clues from different scale receptive fields.The proposed ASRF is favorable against state-of-the-art trackers on various benchmarks while running beyond real-time tracking speed.(2)The semantic background hinders the robustness of the trackingby-detection framework since the backbone network is trained for the classification task.A tracking algorithm based on the attention mechanism is proposed in this thesis to solve the problem.The attention mechanism is built by an hourglass network which forces the network to focus on the most different features between the target and the semantic backgrounds to achieve higher robustness.Besides,we introduce the triplet loss into the proposed framework to minimize and maximize the intra-class inertia and inter-class inertia,respectively.Moreover,triplet loss increases the training data.Experiments show that the above two innovations can improve the tracking performance in complex scenes.(3)Knowledge Distillation(KD)has rarely been applied to visual tracking.We investigate why the standard KD techniques are ineffective for visual tracking algorithms.Two new loss functions are designed and integrated into the proposed Ensemble Learning(EL)framework.One is the Hard-example Aware Loss(HALoss)which mines the hard examples in knowledge distillation.The other is Response Map Loss(RMLoss)which transfers response map hints from one Siamese network to another.The EL treats two Siamese networks as students,enabling them to learn collaboratively.The EL framework yields better performance than training students individually.Experimental results show that the EL framework can improve the tracking performance without increasing parameters and computation.(4)The tracking algorithms based on Transformer suffer from a high computational burden.This thesis presents an efficient and effective Transformer tracking algorithm,Shared Transformer(STrans),to solve this problem.We investigate the redundant computational burden of existing Transformer trackers,and the shared Transformer mechanism can replace the self-attention and cross-attention mechanisms.We further design crucial components such as backbone,Encoder,Decoder,and Position Encoding to improve tracking performance.The STrans tracking method achieves state-of-the-art tracking accuracy with less computation.
Keywords/Search Tags:Visual object tracking, complex scene, Siamese network, attention mechanism, ensemble learning
PDF Full Text Request
Related items