Font Size: a A A

Research On Visual Tracking Methods Based On Object Representation Enhancement

Posted on:2021-03-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y CaoFull Text:PDF
GTID:1488306311971649Subject:Intelligent information processing
Abstract/Summary:PDF Full Text Request
High-performance visual object tracking technology under massive video images has been widely used in civil and military fields such as security monitoring,autonomous driving,human-computer interaction,and precision-guided munition.It is a current research focus in the field of computer vision and artificial intelligence.The ideal visual object tracking system should have the ability to accurately capture and track the object for a long time.However,affected by complex background and highly dynamic scene changes,tracking performance always has greater limitations.In tracking,how to better model the object of interest,enhance its representation ability and accurately estimate its position are key issues to be solved,having great theoretical and practical significance.Traditional modeling methods usually use hand-crafted features such as colors,gradients,key points,etc.,to describe spatial-temporal variations of the object.However,the problems of worse representation and adaptability capabilities,etc.,always exist in these methods,which make them easy to cause tracking instability in complex environments.In recent years,with the development of deep neural networks,it has become possible to learn features and architectures with stronger discriminative ability automatically from large-scale data.Such networks provide new ideas and approaches to solve tracking problems.Based on the theory of deep neural networks,this doctoral dissertation focuses on some key issues in object tracking,and proposes corresponding solutions.The main research contents and contributions are as follows:1.The problem of insufficient spatial-temporal representation ability in visual object tracking is studied.Compared with traditional features,deep features could provide more robust feature representation for tracking models.However,due to the limitations of the feature selection and exploitation methods,it is difficult to capture the variation of the object effectively in the spatial-temporal modeling.To deal with the problem,a hierarchical spatialtemporal context learning networks based tracking algorithm is proposed.First,the algorithm exploits fine-grained and semantic representations from multi-source convolutional layers to assist the construction of spatial context prior models.Second,the mapping neural network is resorted to learn the dynamic transformation relationship between convolutional features and the training confidence map.Finally,the training confidence index is used to realize the adaptive update of tracking networks,which enhances the spatialtemporal modeling ability.Experimental results demonstrate that the hierarchical spatialtemporal context learning networks could enhance the generalization ability of the algorithm effectively in complex unknown scenarios,and improve the tracking performance.2.The problem of insufficient deep feature encoding ability in visual object tracking is studied.In general,convolutional neural networks regard the scalar-level feature as the basic feature unit and ignore aggregation effect of features,which makes it difficult to obtain sufficient local representation information by the simplex encoding dimension.In addition,the feature transformation relationship learned by the scalar classification unit is extremely limited.To deal with the problem,a dual attention capsule networks based tracking algorithm is proposed.The algorithm first exploits the capsule aggregation networks to aggregate the position-aware scalar deep convolutional features into vectorized capsule features,and then utilizes the capsule group attention and capsule penalty attention to realize the discriminative learning within and between capsule entities,which improves the encoding ability of the basic network unit effectively.Experimental results illustrate that the dual attention capsule networks are more robust,and could better handle the issue of tracking aerial rigid objects compared with existing tracking networks.Furthermore,such network is also available for solving feature-encoding problems in object detection,action recognition,image segmentation,etc.3.The problem of insufficient long-term modeling ability in visual object tracking is studied.In long-term tracking tasks,transfer-learning models are likely to cause tracking drift because they are insensitive to object interference from same category,while Siamese models are suitable to track arbitrary objects and possess the ability to capture object's longterm variations.To deal with the problem,a dynamic weighted prediction networks based tracking algorithm is proposed.The algorithm exploits a dynamic weighting network to measure the difference of prediction preferences in cross-correlation response maps,and proposes a dynamic residual mapping mechanism to assist standard dynamic weighting.After that,an online pyramid-redetection mechanism considering the global view is introduced based on the weighting network,which could alleviate the drift problem in continuous tracking.Experimental results demonstrate that the dynamic weighted prediction networks could enhance tracking accuracy and stability in complex long-term scenarios.4.The problem of network architecture redundancy in visual object tracking is studied.At present,most tracking algorithms resort to complex large-scale networks to construct feature models,which results in the enhanced hardware resources and the reduced tracking efficiency,making it difficult to deploy in mobile devices.To deal with the problem,a lightweight multi-level fusion networks based tracking algorithm is proposed.The algorithm first constructs a lightweight model by reducing the amount of network parameters,float point operations and memory access cost,and then exploits channel shuffle and neural architecture search technique to construct multi-level fusion models.Finally,the highefficiency lightweight fusion networks are built by extending the constructed models.Experimental results illustrate that the lightweight multi-level fusion networks could balance the network complexity and tracking performance well,providing technological approaches for tracking networks in miniaturized mobile systems.The research content and the innovative methods in this doctoral dissertation provide theoretical and methodological support for solving high-performance tracking problems in complex environments,and provide a good technological foundation for the integration of detection-tracking-recognition system.
Keywords/Search Tags:Visual Object Tracking, Deep Neural Networks, Convolutional Neural Networks, Capsule Neural Networks, Siamese Neural Networks, Attention Mechanisms, Neural Architecture Search
PDF Full Text Request
Related items