Font Size: a A A

Research On Visual Object Tracking For Natural Scences

Posted on:2023-05-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:D H GeFull Text:PDF
GTID:1528306905496914Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Visual object tracking is fundamental research in the field of computer vision,which is widely used in the fields of video surveillance,autonomous driving,unmanned aerial vehicles and human-computer interaction.The task of visual object tracking is to predict the state of a target continuously in subsequent frames of a video sequence with its initial state,i.e.,its initial position and scale.Due to the impact of interfering factors,such as appearance deformation,motion blur,illumination changes,partial occlusion,out-of-view and in-plane rotation during movement,traditional visual object tracking methods are difficult to distinguish the target and background effectively leading to tracking failure.Deep learning has the ability to automatically mine the deep feature representation of the target from largescale datasets,which can effectively reduce the effect of interfering factors on the visual object tracking,thus the performance of deep learning based visual object tracking can be significantly improved.Currently,for the deep learning based visual object tracking,researchers have made great efforts in two aspects:how to extract robust feature representations and how to effectively learn appearance changes of a target.However,the existing visual object tracking methods have the following limitations:(1)There is a lack of study on constructing highquality online training data,and it is difficult to ensure that the online training data can correctly represent the dynamic changes of target appearance.(2)The existing researches mainly focus on visual object tracking under supervised conditions,and there is a lack of research on how to effectively use large-scale unlabeled data to train visual object tracking models.In view of the limitations above,on the basis of summarizing and analyzing the existing visual object tracking methods,this dissertation studies the visual object tracking for arbitrary types of targets by combining deep siamese neural network,self-paced learning,self-attention mechanism and contrastive learning.The main works and innovations are as follows:(1)This dissertation proposes a visual object tracking method based on self-paced densely connected networks to solve the problem that the existing online training data construction methods cannot correctly represent the appearance changes of target.Assuming that the real response map of the target obeys a two-dimensional Gaussian distribution,the reliability of the tracking result is measured by the similarity between the predicted target response map and the real response map.A self-paced selection model is proposed,which dynamically selects tracking results to construct online training data according to the order of decreasing reliability of the tracking results.The tracking model is updated by online training data to learning dynamic changes in the appearance of target.On this basis,the sample selection step in the self-paced selection model is adaptively adjusted to adapt to sequences with different tracking difficulties.The proposed method takes the feature representation of each convolutional layer as input to all subsequent convolutional layers through a densely connected network,which enhances the propagation and reuse of target feature representations in the network.The method uses the feature pyramid network to fuse the feature representations of different levels in the backbone,enhances the spatial information and semantic information of the target feature representation,improves the ability of the feature representation,and improves the performance of the tracking model.The proposed method achieves good tracking performance on four benchmark datasets.(2)For solving the existing visual object tracking methods depend on a strong assumption,this dissertation proposes a visual object tracking method based on a reliable memory model.To evaluate the reliability of the tracking results,an adaptive reliability evaluation strategy is designed by combining the confidence of the tracking results and the similarity between the historical tracking results.The strategy avoids the assumption that the real response map of the target obeys a two-dimensional Gaussian distribution.To meet the difficulty of tracking different sequences,an adaptive reliability threshold adjustment mechanism is proposed,which makes the reliability adaptive evaluation strategy generalize to all sequences.Beside,inspired by the multi-level storage structure of computer,an active-freeze memory model is designed to store all reliable tracking results to construct online training data.The model consists of two sub-memory models,where the online training data stored in the active sub-memory model is used to update the proposed visual object tracking method,and the frozen sub-memory model saves all inactive reliable tracking results.By exchanging the data between the two sub-memory models,the diversity of the online training data in the active sub-memory model is ensured and thus the proposed visual object tracking method avoids overfitting to the current appearance changes of the target.The proposed visual object tracking method achieves excellent tracking performance on benchmark datasets and ensures that the online training data can correctly represent the dynamic changes of target appearance effectively.(3)This dissertation proposed a visual object tracking method based on multi-head contrastive network in weakly supervised conditions to solve the problem that it is difficult to train visual object tracking model on large-scale unlabeled data.By analyzing the properties of multi-level feature representation in deep neural networks under the contrastive learning,a multi-head contrastive network is proposed.The multi-head contrastive network combines the contrastive branch and template branch and build a separate embedding space for each convolutional layer in the backbone.Through self-supervision between the target response map of the contrastive branch and the target response map of the template branch,unlabeled data can be effectively utilized,avoiding the need to manually label the real response map of target.The embedding space corresponding to each convolutional layer learns the invariance of feature representations at different levels,to avoids the interaction between feature representations at different levels.In order to maintain the consistency in the spatial dependencies and channel dependencies of the feature representation while maintaining the semantic consistency of the feature representation,a global context consistency loss function is proposed,which further improves the feature representation between different appearances of the same target.It further improves the similarity of feature representations between different appearances of the same target,and improves the ability to distinguish targets,similar targets and backgrounds.The proposed method achieves state-of-the-art tracking performance on benchmark datasets,demonstrating the effectiveness of training visual object tracking models with large-scale unlabeled data.(4)This dissertation integrates the three proposed visual object tracking methods and a large number of related comparison methods to design a visual object tracking testing platform based on B/S architecture.The testing platform consists of an experimental preprocessing module,a training module,a testing module and an experimental recording module.The testing platform encapsulates different deep learning platforms and environments of test process,integrates eight visual object tracking algorithms,to provide a unified visual object tracking interface,and forms a complete performance testing platform that can be used for ongoing research on visual object tracking problems,and can be used to expand the application scenarios in the real world.
Keywords/Search Tags:Visual Object Tracking, Self-Paced Densely Connected Network, Reliable Memory Network, Multi-Head Contrastive Network, Visual Object Tracking Testing Platform
PDF Full Text Request
Related items