| Multi-object tracking is an important research direction in the field of computer vision.The main goal is to detect and locate multiple objects in videos or image sequences,assign them non-duplicate identity,and ensure that the identity of the same target does not mutate as much as possible.Finally,the trajectory of each object is obtained.In practical applications,multiobject tracking is often used in the field of intelligent transportation,intelligent security,medical processing and automatic driving.Detection-Based Tracking paradigm,which separates detection and tracking tasks,allows researchers to focus on tracking tasks,so it has attracted the attention of academia.Thanks to the development of deep learning,the accuracy of multi-object tracking is getting higher and higher.However,there are still some problems,such as tracking errors caused by insufficient extraction of object features,difficult extraction of object features caused by occlusion of targets,tracking fragmentation and so on.Based on deep learning,this paper studies multi-object tracking based on detection.The main work and innovations of this paper are as follows:(1)The multi-sensor information fusion method of the existing multi-object tracking algorithms of self-driving cannot full play to the synergy.To solve this problem,a 3D multiobject tracking algorithm based on multi-modal feature fusion and learnable object similarity estimation is proposed.The multi-modal feature fusion module fuses the feature of image and point cloud based on the channel attention mechanism,further improving the expression ability of multi-modal features.The object similarity estimation module directly generates the similarity matrix through the network,and realizes the cross-modal joint reasoning between multiple objects in a learnable way,avoiding a lot of manual parameter setting.Validation and ablation experiments are carried out on the KITTI dataset,and the results show that the proposed algorithm is superior to other algorithms in accuracy and has better robustness.The ablation experiments show that both modules are necessary.The feature fusion module can make the features of different modalities work at the same time and improve the tracking accuracy;the object similarity estimation module does not need manual experiments to determine the threshold and saves a lot of time.(2)Aiming at the problem that high-dimensional semantic features such as twodimensional features and three-dimensional features are not suitable for directly fusing lowdimensional features such as object location information,and in order to reduce the impact of false positive detections on tracking performance,a multi-object tracking algorithm based on distance fusion and false detection filter is proposed.The false detection filter combines the detection confidence and the object score output by the false positive detection discrimination network to jointly judge the detection properties,so as to ensure that the true positive detection is retained as much as possible,and filter a large number of false positive detections.The distance fusion module calculates the weight of feature distance according to the object fusion feature,fuses the distance calculated by different features according to this weight,so that the judgment information of whether the different modal features match the target can be used.Validation and ablation experiments are performed on the KITTI dataset,and the results show that the proposed algorithm improves the final tracking effect.The false positive detection filter reduces false positive detections so that the tracker does not receive erroneous inputs,which fundamentally improves the performance of tracker training,and can be easily embedded into other tracking frameworks.The feature distance weight estimation module in the distance fusion module calculates the weight of feature distance,so that the different modal features can play their role in the decision-making layer to the maximum extent,thereby improving the tracking accuracy. |