Learning Visual Attention And Robust Deep Feature For Object Detection And Tracking

Posted on:2020-04-23

Degree:Doctor

Type:Dissertation

Country:China

Candidate:X Wang

Full Text:PDF

GTID:1368330602457343

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Object detection and tracking is the fundamental task of computer vision and also the key technique of intelligent video surveillance system.With the help of deep learning,the development of these domains is already amazing.However,object detection and tracking are still challenging tasks due to the complexity of data,scene,and environment.This paper target to handle these complex factors in object detection from the perspective of visual attention and robust deep feature learning.Specifically,the research and analysis can be divided into the following sections:adaptive weighted multi-modal salient object detection,target-driven visual attention generation for visual tracking,hard positive gen-eration for visual tracking,tracking by natural language specification and hard person identity mining for cross-camera tracking.Firstly,we propose the adaptive multi-modal information fusion mechanism.For the deep learning-based saliency detection algorithm,we propose a quality-aware multimodal salient object detection framework based on deep reinforce-ment learning.We take the adaptive weighting on different modal data as the decision-making problem.The proposed algorithm is validated on two kinds of dual-modal saliency detection benchmarks.Secondly,we propose to jointly utilize the global and local candidate samples to handle the issues existed in the current tracking-by-detection framework,such as heavy occlusion,scale variation,and reappearance.Specifically,we achieve a global proposal generation via target-driven visual attention maps.To better capture the motion information,we use 3D CNN to extract features from several continuous video frames.Meanwhile,we also obtain the features of the target object with 2D CNN.These two features are concatenated together and input to the up-sample network.This network is trained with mean squared error and adversarial loss function.The training data can be obtained from the existing tracking dataset without any additional annotation.We first obtain the rectangle regions according to saliency regions and conduct Gaussian sampling.In the tracking procedure,the global and local proposals are all input into the classifier and the proposal with maximum score will be chosen as the result of the current video frame.A short and long-term update strategy is adopted to update the model.Extensive experiments on multiple tracking benchmarks validated the effectiveness of the proposed algorithm.Thirdly,few-sample learning is another key issue in visual tracking.How-ever,the deep learning network only works well when trained with large-scale data.Hence,there exists a gap between a few-sample visual tracking task and a data-hunger deep neural network,which may limit the tracking performance.Besides,the short of hard samples in practical training datasets also make their trackers not robust to challenging factors.To handle these issues,this paper proposes to actively generate massive hard samples to bridge this gap.Specif-ically,this paper constructs the manifold of the target object with variational auto-encoder,then decoding massive positive target object images.Meanwhile,we also use a background patch to occlude the target object to make the tracker more robust to occlusion.Massive hard training samples can be obtained via the aforementioned techniques and this will make the baseline tracker performs better.Fourthly,the current most popular setting of visual tracking is initialized with one bounding box to represent the target object in color images.Howev-er,only the appearance model is not enough in a practical tracking procedure,especially when facing complex background,fast motion,etc.In this paper,we take the structural information between training samples into consideration with graph convolutional networks and also introduce the natural language specifica-tion for more robust deep feature learning.Also,we adopt the encoder-decoder framework to generate the global attention map based on natural language and target object patch to deal with reappearance,fast motion,heavy occlusion,etc.Our experiments validate that the tracking performance can be enhanced significantly with the guidance of natural language.Fifthly,for the person tracking problem under the cross-camera scenario,one regular pipeline is to use triplet loss for feature learning and compare the difference between human image in the feature space.They adopt local mini-batch construction and also ignore the correlations between the average feature of the same personal identity and each image feature,which may limit their final recognition results.This paper first estimates the attributes of each image in the person re-identification dataset.Then,the attribute distance between different pedestrians can be measured for global mini-batch construction.In the training phase,this paper considers the correlations between the average feature and each image feature of the same person,and use it as an additional criterion to optimize the neural network.We add this regularization term into the triplet loss function.Extensive experiments on both pedestrian attribute recognition and person re-identification datasets validated the effectiveness of this algorithm.

Keywords/Search Tags:

Visual Attention, Deep Feature Learning, Salient Object Detection, Visual Tracking, Multi-modal Fusion

PDF Full Text Request

Related items

1	Researches On RGB-D Visual Salient Object Detection Algorithms Based On Feature Fusion
2	Research On Visual Feature Space Perception Algorithms Based On Deep Learning
3	Research On The Method Of Image Salient Object Detection Based On Convolutional Neural Network
4	Visual Object Tracking Method Based On Deep Learning
5	Salient Object Detection Based On Global And Local Perception
6	Research On Salient Object Detection Method Based On Deep Learning
7	Research On A Fusion Strategy Of Visual Attention Feature For Object Tracking
8	Research And Algorithm Implementation Of Efficient Visual Question Answering Based On Deep Learning
9	The Study Of Salient Object Detection Algorithm Based On The Visual Perception Mechanism
10	Image Salient Object Detection Based On Visual Perception And Attention