Font Size: a A A

Research On Long-term Target Tracking Algorithm Based On Spatiotemporal Feature Enhancemen

Posted on:2024-07-18Degree:MasterType:Thesis
Country:ChinaCandidate:P S SunFull Text:PDF
GTID:2568307106475914Subject:Electronic information
Abstract/Summary:
Visual object tracking(VOT)is a fundamental task in computer vision,which has widely applied in realistic scenarios.Compared to short-term object tracking,long-term object tracking is more practical,so the research focus of object tracking algorithms has tended towards developing towards long-term tracking in recent years.After an in-depth analysis of the challenges faced by long-term object tracking,such as object deformation,interference from similar objects,and long-term occlusion,this paper proposes a long-term object tracking algorithm based on spatiotemporal feature enhancement.The algorithm effectively improves the accuracy of the long-term tracker by utilizing spatial and temporal information to enhance feature expression.The main research contents are as follows:Aiming at the problem that the existing long-term object tracking algorithms cannot insufficient utilize the target appearance information,this paper proposes a long-term object tracking algorithm with multi-level feature enhancement network.Firstly,a dual feature enhancement module is introduced to optimize the spatial features extracted by the Res Net backbone network by focusing on important information in the width,height,and spatial dimensions of the feature map,thereby enabling the model to obtain stronger target appearance features.Then,a guided mask is used to assist the model in distinguishing between foreground and background regions to improve the discriminative power of the features.Additionally,a learnable filter is used in the frequency domain to automatically filter out features that are suitable for the current tracker,highlighting effective information and suppressing useless information to produce cleaner features.Finally,a cohesive loss function is introduced to expand the range of difficult samples and penalize simple samples to alleviate sample imbalance issues.Experimental results show that the proposed algorithm outperforms state-ofthe-art algorithms on four challenging datasets,and maintains high accuracy and stability in the face of challenges such as target appearance and scale variations.Specifically,the proposed approach achieves a AUC score of 62.0% in the TLP and a 1.7 percentage point improvement in the VOT2020 LT.The first work in this paper did not fully utilize the temporal information between consecutive video frames,and therefore performed poorly in the face of challenges such as interference from similar targets.In order to further improve the performance of long-term object tracking algorithms,this paper proposes a multi-scale structure-preservation and temporal memory fusion network based on the previous work.Firstly,the network uses a multiscale structure to fuse target features at different scales,enabling the model to consider both shallow edge information and deep semantic information,while using high-order selfcorrelation modules to enhance the near and far structural information of the target,achieving refined multi-scale feature enhancement.Secondly,the network uses a temporal memory fusion module to model the motion information between adjacent frames,while using large kernel convolutions to expand the network’s receptive field and enhance global contextual information,achieving deep mining of spatial and temporal information of consecutive frame features.Finally,a novel trainable matching operator is proposed for the correlation fusion of the template branch and test branch,which adaptively corrects the feature maps of naive and deep correlation outputs to obtain more efficient matching features.Through experiments on benchmark datasets,the algorithm exhibits excellent performance in the field of long-term object tracking,and the success rate and accuracy score significantly exceed the first work,achieving a 1.3 percentage point improvement in the difficult LaSOT.
Keywords/Search Tags:Long-term object tracking, Multi-level feature enhancement, Multi-scale fusion, Temporal information mining
Related items