Research On Visual Tracking Methods Based On Object Representation Enhancement

Posted on:2021-03-16

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Y Cao

Full Text:PDF

GTID:1488306311971649

Subject:Intelligent information processing

Abstract/Summary:

PDF Full Text Request

High-performance visual object tracking technology under massive video images has been widely used in civil and military fields such as security monitoring,autonomous driving,human-computer interaction,and precision-guided munition.It is a current research focus in the field of computer vision and artificial intelligence.The ideal visual object tracking system should have the ability to accurately capture and track the object for a long time.However,affected by complex background and highly dynamic scene changes,tracking performance always has greater limitations.In tracking,how to better model the object of interest,enhance its representation ability and accurately estimate its position are key issues to be solved,having great theoretical and practical significance.Traditional modeling methods usually use hand-crafted features such as colors,gradients,key points,etc.,to describe spatial-temporal variations of the object.However,the problems of worse representation and adaptability capabilities,etc.,always exist in these methods,which make them easy to cause tracking instability in complex environments.In recent years,with the development of deep neural networks,it has become possible to learn features and architectures with stronger discriminative ability automatically from large-scale data.Such networks provide new ideas and approaches to solve tracking problems.Based on the theory of deep neural networks,this doctoral dissertation focuses on some key issues in object tracking,and proposes corresponding solutions.The main research contents and contributions are as follows:1.The problem of insufficient spatial-temporal representation ability in visual object tracking is studied.Compared with traditional features,deep features could provide more robust feature representation for tracking models.However,due to the limitations of the feature selection and exploitation methods,it is difficult to capture the variation of the object effectively in the spatial-temporal modeling.To deal with the problem,a hierarchical spatialtemporal context learning networks based tracking algorithm is proposed.First,the algorithm exploits fine-grained and semantic representations from multi-source convolutional layers to assist the construction of spatial context prior models.Second,the mapping neural network is resorted to learn the dynamic transformation relationship between convolutional features and the training confidence map.Finally,the training confidence index is used to realize the adaptive update of tracking networks,which enhances the spatialtemporal modeling ability.Experimental results demonstrate that the hierarchical spatialtemporal context learning networks could enhance the generalization ability of the algorithm effectively in complex unknown scenarios,and improve the tracking performance.2.The problem of insufficient deep feature encoding ability in visual object tracking is studied.In general,convolutional neural networks regard the scalar-level feature as the basic feature unit and ignore aggregation effect of features,which makes it difficult to obtain sufficient local representation information by the simplex encoding dimension.In addition,the feature transformation relationship learned by the scalar classification unit is extremely limited.To deal with the problem,a dual attention capsule networks based tracking algorithm is proposed.The algorithm first exploits the capsule aggregation networks to aggregate the position-aware scalar deep convolutional features into vectorized capsule features,and then utilizes the capsule group attention and capsule penalty attention to realize the discriminative learning within and between capsule entities,which improves the encoding ability of the basic network unit effectively.Experimental results illustrate that the dual attention capsule networks are more robust,and could better handle the issue of tracking aerial rigid objects compared with existing tracking networks.Furthermore,such network is also available for solving feature-encoding problems in object detection,action recognition,image segmentation,etc.3.The problem of insufficient long-term modeling ability in visual object tracking is studied.In long-term tracking tasks,transfer-learning models are likely to cause tracking drift because they are insensitive to object interference from same category,while Siamese models are suitable to track arbitrary objects and possess the ability to capture object's longterm variations.To deal with the problem,a dynamic weighted prediction networks based tracking algorithm is proposed.The algorithm exploits a dynamic weighting network to measure the difference of prediction preferences in cross-correlation response maps,and proposes a dynamic residual mapping mechanism to assist standard dynamic weighting.After that,an online pyramid-redetection mechanism considering the global view is introduced based on the weighting network,which could alleviate the drift problem in continuous tracking.Experimental results demonstrate that the dynamic weighted prediction networks could enhance tracking accuracy and stability in complex long-term scenarios.4.The problem of network architecture redundancy in visual object tracking is studied.At present,most tracking algorithms resort to complex large-scale networks to construct feature models,which results in the enhanced hardware resources and the reduced tracking efficiency,making it difficult to deploy in mobile devices.To deal with the problem,a lightweight multi-level fusion networks based tracking algorithm is proposed.The algorithm first constructs a lightweight model by reducing the amount of network parameters,float point operations and memory access cost,and then exploits channel shuffle and neural architecture search technique to construct multi-level fusion models.Finally,the highefficiency lightweight fusion networks are built by extending the constructed models.Experimental results illustrate that the lightweight multi-level fusion networks could balance the network complexity and tracking performance well,providing technological approaches for tracking networks in miniaturized mobile systems.The research content and the innovative methods in this doctoral dissertation provide theoretical and methodological support for solving high-performance tracking problems in complex environments,and provide a good technological foundation for the integration of detection-tracking-recognition system.

Keywords/Search Tags:

Visual Object Tracking, Deep Neural Networks, Convolutional Neural Networks, Capsule Neural Networks, Siamese Neural Networks, Attention Mechanisms, Neural Architecture Search

PDF Full Text Request

Related items

1	Visual Tracking Algorithms Based On Convolutional Neural Networks
2	Study On Neural Networks Machine And Its Application In Control
3	Action Recogniton Based On Deep Neural Networks With Visual Attention Mechanism
4	Research On Parallel Computing Architecture Of Siamese Network Algorithm
5	Visual Object Tracking Based On Convolutional Neural Networks
6	Siamese Neural Network Models For Thermal Infrared Object Tracking
7	VLSI Optimizations And Implementations For Convolutional Neural Networks
8	Neural Architecture Design And Training Method For Efficient Deep Neural Networks
9	Target Tracking Research Based On Deep Siamese Convolutional Neural Networks
10	Research On Visual Object Tracking Algorithm Based On Siamese Neural Network