Font Size: a A A

Deep Visual Object Tracking Based On Epresentation Reinforcement And Decision Optimization

Posted on:2021-04-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:P GaoFull Text:PDF
GTID:1368330614450933Subject:Microelectronics and Solid State Electronics
Abstract/Summary:PDF Full Text Request
Visual object tracking is the most basic and essential research problem in the field of machine vision,with a large number of application in the real-world,has attracted more attentions of researchers.Visual object tracking methods generally consist of two models: visual representation and tracking decisions.During visual object tracking,tracking decision continuously annotates the target object in the video sequence based on their visual representation.With the renaissance of deep learning technology,there is a major opportunity for visual object tracking performance enhancement.Compared with the manual described features used in traditional visual object tracking methods,deep features learned by convolutional neural networks have better discriminative and representative capabilities.Siamese network further integrates visual representation and tracking decision into a unified framework,which enables the end-to-end training and inference advantages of deep learning to be fully explored in the visual object tracking field.However,during visual object tracking,the appearance of the target object undergoes variations and there are various complex interfering factors in the video sequences,making the visual representation and tracking decisions challenging and difficult.This dissertation conducts intensive studies on two aspects of representation enhancement and decision optimization for deep visual object tracking methods,as follows.(1)As the non-end-to-end deep visual object tracking method using correlation filters for tracking decisions can be rapidly optimized using dense samples and highdimensional deep features,but is limited to the weak discriminative power of the decision optimization model based on ridge regression,its tracking performance can be further improvement.This dissertation uses support vector machine with strong discriminatory power to optimize the correlation filter and a tracking decision model based on the support vector filter is proposed.On the one hand,this model can be optimized by using dense samples and high-dimensional deep features with the help of circulant sampling and fast computation in the frequency domain;on the other hand,it considers visual object tracking as a problem of maximizing the classification margin between the target object and the background surroundings,which improves the discriminative power of the nonend-to-end deep visual object tracking method.In addition,to address the problem of inadequate representation for appearance variations of target objects by a single type ofdeep feature,this dissertation uses multiple layers of complementary deep features to enhance the robustness of visual representation,and a multiple confidence fusion strategy to obtain more accurate tracking results.(2)Convolutional neural networks pretrained on static image classification datasets are not sufficiently generalizable for dynamic visual object tracking tasks due to their incapability to learn category differential information between the target object and the background surroundings,and temporal coherent information of the target object in the video sequence.In this dissertation,the attentional mechanism of machine vision is investigated and a representation reinforcement model based on attentional learning is designed.The model utilizes inter-and intra-frame attention during visual object tracking to fully explore potential critical information and selectively reinforces critical visual representation.In addition,a decision optimization model based on background-aware correlation filtering is constructed and embedded into the backbone network in order to enhance the ability of the tracking decision to adapt to appearance variations of the target object.While the decision model and visual representation can be trained and inferred in the end-to-end fashion,the decision model can also be optimized online based on variations of the target object and background surroundings.(3)Convolutional neural networks are complexity and always loses target object's geometric details during downsampling,whereas shallow convolutional neural networks cannot obtain more discriminative semantic information through sequential inference.This dissertation investigates the hierarchical learning of deep features and design a representation reinforcement model with deep structure and symmetric topology to continuously extract and aggregate different levels of geometric details and semantic information through bottom-up and top-down iterative inference.Besides,in order to improve the training and inference efficiency of the approach and reduce the number of parameters,a lightweight method for deep structured convolutional neural networks is explored.With the support of representation reinforcement,this dissertation further proposes a decision optimization model based on annotation boxes detection to achieve more accurate tracking performance.(4)To solve the problem that current deep visual object tracking methods often generate additional scaled candidates and introduce redundant parameters to deal with target object scale variations,this dissertation constructs a extreme-points-aware decision optimization model.The target object is modeled as extreme-points located on its fouredge,and the accurate estimation of scale variation problem is achieved by detecting the positions of the four extreme-points.The model is simple and efficient,relying directly on visual representation to estimate the target object's size.Moreover,considering the spatial sensitivity of the extreme-points and the lack of semantic information in the edge region of the target object,this dissertation proposes an representation reinforcement model based on parallel refinement across different convolutional layers.By conducting parallel information exchanges and top-down adaptive fusion of geometric details and semantic information with different spatial resolutions,it effectively provides visual representations with both high spatial resolution and hierarchical information for the extreme-points-aware tracking decision.In this dissertation,four novel deep visual object tracking methods based on representation reinforcement and decision optimization are proposed to effectively improve the performance and efficiency of visual object tracking by addressing the different problems and weaknesses of the existing visual representation and tracking decision models.Evaluation results on several large visual object tracking benchmark datasets show that the proposed methods can accurately and reliably annotate target objects in various complicated scenarios,providing essential reference and guidance for the development of deep visual object tracking.
Keywords/Search Tags:Visual object tracking, deep learning, representation reinforcement, decision optimization
PDF Full Text Request
Related items