Object tracking aims to stably and accurately track the object of interest in continuous frames.It is the basis of downstream tasks,such as scene analysis and planning decisions,which requires the intelligence agent possess the ability of perceiving and understanding the environment.At present,the popularity of different sensors provides different means for people to deal with complex scenes.Among them,videos and point clouds are penetrating into all fields of people’s production and life.Benefiting from the accumulation of massive data and the development of computing power,deep learning has given new energy to the research community of target tracking.In this paper,we study object tracking algorithms in these two modalities based on the Siamese neural network.For the video-based object tracking problem,many methods focus more on designing some modules for the Siamese network and conduct matching between the template and the search area.However,they lack the deep analysis for discriminative features learning.From the perspective of loss function,we study the relationship between different instances within fully convolutional features.As for the object tracking in point clouds,the recent method takes inspiration from 2D visual tracking and use the siamese network to learn a similarity matching function,which ignores the state estimation.We thus aim at bridging the gap between target discrimination and state estimation,dedicating to their seamless connection.The main innovations are as follows:(1)For the discriminative features learning problem in fully convolutional network,this paper proposes a information enhances loss function to deeply mine the underlying relationship between the template and the positive/negative instances of the search area.The proposed loss not only builds a densely connected structure,but also can be degraded into the logistic and triplet loss of Siam FC and Siam FC-tri.Therefore,it is a general formula and can be used to improve a class of Siamese network frameworks.In addition,the derivative analysis of our loss function proves that it considers the global constraints in the search area and provides more feedback information beneficial to network learning when the negative response is large.Extensive experiments on multiple datasets demonstrate that the proposed loss achieves considerable improvements when applied to different baseline methods.In particular,on the OTB2015 and GOT10 k datasets,our method obtain 3.69% and 4% improvements in the success ratio.(2)Considering the tracking in unstructured point cloud lacks target state estimation,this paper incorporates the traditional method into Siamese network and proposes a progressive tracking framework.It realizes a strong alliance between target state estimation and target discrimination and thus can improve the 3D tracking performance in point clouds.To solve the problem of target state estimation,we extend the tradictional Lucas-kanade method into 3D tracking task,which uses the Siamese network to describe the transformation of the target object in 3D space and leverage the Jacobian matrix to model the relationship between the target appearance and geometric displacement.In this procedure,we also introduce a column-wise finite difference and learning-based methods to approximate the Jacobian matrix.Moreover,to increase the robustness of target discrimination network,we propose a two-level feature fusion architecture,which trains more discriminative features by fusing the hierarchical confidence loss and point cloud completion loss.Last but not least,we propose an online tracking algorithm in which state estimation and target discrimination are mutually assisted to achieve progressive target tracking,so as to robustly deal with some challenging scenarios.Experiments on the outdoor Li DAR dataset show that the proposed method achieves competitive perfomance in point cloud tracking,especially improving 13.68% in the success ratio compared with baseline.(3)To further address the problem of candidate generation and separate training of state estimation and target discrimination subnetworks,this paper proposes a deep supervised descent method(SDM)with multiple seeds generation for point cloud tracking,which combines the target state estimation and target discrimination tasks in an end-to-end way.This method extends the SDM into the 3D Li DAR point cloud,for which it needs to dynamically establish the connection of appearance features and geometric displacement.We utilizes bounding box-aware pooling operations and a data-driven approach to learn a likelihood distribution of geometric displacements as a function of appearance features.In addition,considering that only starting from a single state in SDM procedure will cause the tracking results to drift,we propose to integrate multiple seed generation into the whole Siamese network,which utilize deep Hough voting to explore state representation space in parallel.More importantly,to train this end-to-end network,we design a multi-task loss function to balance different modules of state estimation,object discrimination,and multiple seeds generation.Experiments on multiple point cloud datasets show that the proposed method achieves considerable performance,and significantly outperforms the SETD by 11.3% in the success ratio on KITTI. |