Font Size: a A A

Research On Thermal Infrared Object Tracking Based On Deep Representation Learning

Posted on:2022-10-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q LiuFull Text:PDF
GTID:1488306569487164Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years,with the gradual decrease in the price of Thermal Infra Red(TIR)imaging equipment and the gradual improvement of the imaging quality,more and more civilian fields begin to use TIR imaging equipment.This leads the intelligent analysis technology based on the TIR video to receive more attention.TIR object tracking as one of the key intelligent analysis technologies is indispensable.However,due to the TIR image has no color,lacks rich textures,and has fuzzy contours,the existing feature models designed for RGB images are difficult to extract powerful discriminative feature representations of TIR image objects,leading to the TIR trackers using these feature models are easy to drift to distractors during the tracking process.To solve this problem,we propose to learn TIR-specific deep representation models for TIR tracking to improve the discriminative ability of the appearance model,thereby improving the accuracy and robustness of TIR trackers.Specifically,we propose the following methods.(1)Existing hand-crafted features are difficult to effectively represent the TIR image object because the TIR image has no color,lacks of rich texture,and has fuzzy contours.To solve this problem,we transfer the pre-trained deep convolutional features based on classification task using RGB images to represent the TIR image object.Then,we propose a multi-expert fused ensemble TIR tracker based on this pre-trained feature model.The proposed method uses the complementary characteristics of deep convolution features and shallow convolution features to construct a weakly correlated filter tracker on multiple different convolution layers and obtains multiple corresponding response maps of the tracking result.Finally,we propose a multi-expert decision fusion method based on KL divergence to integrate multiple weak response maps to obtain a stronger tracking result.(2)The classification based pre-trained deep feature is not coupled with the TIR object tracking task.To solve this issue,we propose to learn hierarchical spatial-aware features to represent the TIR image object under the matching task.The feature model consists of a hierarchical feature fusion module and a spatial variation aware module.The hierarchical feature fusion module fuses multi-layer convolutional features of different scales and different representation capabilities into a same feature space.The spatial variation aware module learns the rotation,translation,and scaling invariance characteristics of the object in the fused feature space.The overall feature model is offline trained by a matched fully convolutional Siamese convolutional neural network end-to-end.In the online tracking stage,the learned matching network is directly used to calculate the similarity between the target template and the candidate samples to return the response map of the tracked object.(3)A single deep global semantic feature is difficult to effectively distinguish the object from distractors.To solve this problem,we propose an appearance model that combines local structural features and global semantic features based on a minimizing relative entropy.The model consists of a local structure aware module,a global semantic enhancement module,and an adaptive relative entropy fusion module.The local structure aware module obtains discriminative local structure features by using a self-attention mechanism of the spatial direction on the shallow convolution feature.The global semantic enhancement module uses a self-attention mechanism of the channel direction to enhance the semantic features on the deep convolution feature.The local structure feature module and the global semantic feature module is adaptively integrated into an end-toend learnable matching network through a relative entropy module.The local structural feature and global semantic feature obtained by optimizing the matching network learn to recognize the object from the two complementary aspects of the local and the global,thus obtaining a stronger discriminative ability than a single deep global semantic feature.(4)Existing deep feature cannot extract the fine-grained feature of the TIR image object because TIR image object lacks rich detail information.This leads the TIR trackers using these feature models are difficult to handle the distractor challenge.To solve this issue,we propose an appearance model based on a multi-task architecture that simultaneously learns TIR-specific discriminative features and fine-grained correlation features.We first use a multi-classification task to guide the generation of the TIR-specific discriminative features,and use the discriminative feature in a discriminative matching task to distinguish the object form different class.Then,we propose a fine-grained aware module to capture the fine-grained correlation features,and use the fine-grained correlation feature in a fine-grained matching task to recognize the object form same class.The fine-grained aware module is composed of a local block correlation and a pixel-level correlation module,which respectively model the correlation between local regions and the correlation between feature units.The TIR-specific discriminative feature and the fine-grained correlation feature complement each other to distinguish objects from the inter-class and intra-class respectively.Finally,we integrate the discriminative matching task,the fine-grained matching task,and the multi-classification task into a unified multitask framework to learn these two features on the tracking task simultaneously.(5)The field of TIR object tracking lacks a large-scale training dataset for learning TIR-specific appearance models and an test dataset for evaluating TIR trackers comprehensively.To solve this problem,we propose a large-scale and high-diversity standard TIR training and evaluation dataset for TIR object tracking.Data is the cornerstone of the deep learning method,and a large-scale TIR training dataset is the prerequisite for learning TIR-specific feature models.We first get 1 400 TIR videos by collecting on Internet and preprocessing steps.These videos contain more than 600 000 frames and 47 object categories.Then,we obtain more than 700 000 object bounding boxes efficiently and accurately through a self-designed tracking-based semi-automatic label tool.To evaluate TIR tracking methods fairly and comprehensively,we construct a TIR general object tracking evaluation dataset,which contains 120 video sequences,82 000 frames,22 object categories,12 kinds of challenges,and 4 scenarios.The evaluation and comparison experiment results of a large number of tracking methods on the evaluation dataset demonstrate that the evaluation dataset is effective and the proposed TIR training dataset can significantly improve the performance of deep trackers.
Keywords/Search Tags:Thermal infrared object tracking, deep learning, local structure feature, finegrained correlation feature
PDF Full Text Request
Related items