Font Size: a A A

Visual Object Tracking Based On Attention Mechanism

Posted on:2022-09-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:F DuFull Text:PDF
GTID:1488306569987229Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Visual object tracking is an important research direction in the field of computer vision,which has a wide range of applications in fields such as intelligent monitoring,human-computer interaction,and autonomous driving.Given only the initial position and size of any target in a video,a visual tracker needs to continuously estimate the subsequent states of the target without using other prior information.Factors such as target deformation,occlusion,illumination changes,and background clutter pose formidable challenges to the accuracy,robustness and real-time performance of the object tracking algorithm.Tracking in complex environment is a problem full of theoretical and practical challenges.The attention mechanism is one of the methods for humans to effectively filter information.Humans usually selectively process a large amount of visual information received by the human eye,thereby improving the efficiency of resource processing.The attention mechanism can help the visual model to selectively use information and guide the visual system to pay attention to the region of interest.The information in object tracking is mainly represented by image features,which contain not only useful information related to the target,but also harmful information that interferes with the target or useless information that has nothing to do with the target.Effectively mining and utilizing useful information in features is essential to improving the performance of object tracking.Aiming at the feature utilization and learning problems in object tracking,this thesis has carried out the following four research contents in the use of attention mechanism to improve the utilization efficacy and representation ability of features.First,a spatial-temporal adaptive feature weighted object tracking method based on spatial attention is proposed to solve the problem of model drifting caused by the interference of background.The model in this method focuses its attention on the target.The spatial-temporal adaptive feature weights are constructed by combing the target likelihood map constructed by the color histogram model and the prior weights constructed by considering the distance between the pixel position and the target center.By introducing spatial-temporal adaptive weights into the correlation filter tracking framework,the tracking model can effectively use target information,weaken the influence of background information,and improve the efficacy of feature utilization.By introducing pseudo filter variables to retain the cyclic characteristics of the sample matrix in correlation filter model,this method effectively optimizes the loss function.Experimental results show that the adaptive feature weights can effectively improve the performance of correlation filter methods and reduce the possibility of tracking model drift.Second,a joint channel reliability and correlation filters learning object tracking method based on channel attention is proposed to solve the problem that different feature channels are not used selectively during tracking which affects the discriminative property of the model.This method focuses its attention on reliable feature channels.It assigns reliability weights to each channel,introduces it as a learnable variable into the correlation filter model,and constructs prior weights based on the historical tracking results as the regularization of the current channel weights.The method jointly learns the current channel weights and correlation filters.The joint learning problem is solved by iteratively alternating between the two variables.When solving the sub-problem of channel weight learning,the method effectively optimizes it by proving the upper bound of the loss function.The method of joint learning enables the model to effectively use the information of reliable channels,weaken the influence of unreliable channels,and improve the efficacy of feature utilization.The finally learned model is more discriminative,and the learned channel weights can more effectively weight the channel responses during tracking.Experimental results show that the proposed joint learning model can adaptively adjust the contribution of different channels,and can improve the performance of the correlation filter method without significantly increasing the amount of calculation.Third,a correlation-guided attentional corner detection-based object tracking method is proposed to solve the problem that the tracking method based on corner detection will face corner localization ambiguity,and it is difficult to effectively exploit the relationship between the target template and the target search area to enhance the detection accuracy.The method adopts a two-stage structure.In the first stage,a lightweight Siamese network model is used to estimate the target bounding box coarsely and narrow down the search area of the corner detection module in the second stage.Before corner detection in the second stage,the spatial attention model guided by the pixel-wise correlation results and the channel attention model guided by the channel-wise correlation results respectively explore the relationship between the template and the search area in different ways.The attention model enhances the spatial information of the bounding box corners in the feature and the ability to discriminate the corners,thereby improving the feature representation ability.Experimental results show that the proposed method can effectively locate the corners of the bounding box,and achieve state-of-the-art tracking performance via bounding box corner detection.Fourth,a dual attentional boundary detection-based object tracking method is proposed.The target bounding box is obtained by predicting the boundaries during tracking,which solves the problem that the commonly used target bounding box regression methods have difficulty in accurately estimating the target bounding box.This method roughly estimates the bounding box via the lightweight Siamese network which is used in the corner detection method in the first stage,and learns a feature map for each boundary of the target bounding box in the second stage,and obtains the accurate bounding box by directly estimating the position of boundaries.In order to better detect the boundary of the bounding box,a template integration module and dual attention models are proposed.The template integration module takes the pixel-wise correlation results between the template and the search area features as additional features,which introduces the target outline information and enhances the discriminative property to the target boundary.The dual attention models contain the target-aware attention module and boundary-aware attention module.The target-aware attention module enhances the target area to improve the discrimination between the target and the background;the boundary-aware attention module learns the spatial information of the boundary and enhances the boundary region.The dual attention models adjust the model's focus region in different ways,and improve the representation ability of boundary detection features.Experimental results show that the proposed template integration module and dual attention models can enhance the performance of boundary detection.The proposed tracking method obtains accurate bounding box estimation through boundary detection and achieves state-of-the-art tracking performance.At the end of this thesis,the proposed methods are verified in the open environment.In order to meet the challenges of the actual tracking environment and improve the practicability and adaptability of the proposed methods,this thesis takes advantage of the strong discriminative property of correlation filter-based tracking and the strong generalization of Siamese network-based tracking.By combining the Siamese network in the first stage of the corner detection method and the boundary detection method with an online classifier which approximates correlation filtering,the resulting tracker can make up for the shortcomings of the Siamese network that is easily affected by similar distractors,and enhance its robustness and adaptability.The resulting tracker also retains the characteristics of high accuracy of corner detection and boundary detection.The combined two methods have high accuracy,high robustness,and real-time performance,which can be better applied.Tracking experiments in the security surveillance video show that the two methods can effectively deal with the challenges in surveillance scenarios,and perform accurate,robust,and real-time tracking.Aiming at the problem of feature utilization and learning in object tracking,this thesis constructs different attention models according to different problems.The research goes from feature utilization to feature learning,from the representation of attention in the correlation filtering framework to the learning of attention in the deep network.The attention mechanism is used to improve the utilization efficacy and representation ability of features,which effectively improves the performance of object tracking.
Keywords/Search Tags:Visual object tracking, Attention mechanism, Correlation filtering, Siamese networks, Feature utilization and learning
PDF Full Text Request
Related items