3D multi-object tracking,as a significant research direction in computer vision perception,is widely utilized in fields such as autonomous driving and robot vision navigation.Camera images have rich semantic feature information but lack depth information,which cannot effectively deal with the problem of missed detection of occluded objects;while point cloud data collected by lidar can provide accurate position information and geometric information of the object,but there are problems such as denseness in the near area and sparseness in the distance,disorder,and uneven distribution,and the tracking effect on long-distance small objects is poor.Therefore,fusion of lidar and camera information can make up for the defects existing in single sensors,but the existing fusion methods are less effective and do not obtain rich fusion features.In addition,existing tracking methods can lead to the loss of tracked objects with low detection confidence,and the use of intersection ratio or Euclidean distance to calculate the similarity of objects can easily cause problems such as trajectory fragmentation and identity switching,and the tracking effect is poor in complex scenes.In view of the above problems,a 3D multi-object tracking based on the fusion of lidar and camera information method is proposed in this paper.The main research contents are as follows:A 3D multi-object tracking method with multimodal fusion and two-stage association is proposed.Firstly,a multimodal feature adaptive gating fusion module is proposed to adaptively fuse point cloud and point-by-image features to improve the tracking effect on small and occluded objects.Secondly,the detection and embedding branches are jointly learned,and a two-stage data association strategy is proposed to set high and low confidence thresholds to filter the detection results,while the combination of appearance and position information is designed to effectively avoid the problem of loss of occluded objects during tracking.Experimental results on the KITTI dataset show that the high-order tracking accuracy HOTA and multi-object tracking accuracy MOTA of the proposed method reach 75.59%and 87.62%,respectively,and the overall tracking performance is better than the existing advanced 3D multi-object tracking methods.A 3D multi-object tracking method with multimodal fusion and multiple information association is proposed.First,a hybrid soft attention feature enhancement module is designed to enhance the semantic information of image features using channel separation technique.Second,a semantic feature-guided multimodal fusion network is proposed to fuse point cloud features,point-by-point image features,and image features to enhance the fusion effect of multimodal features and obtain richer fused features.Meanwhile,a multi-information perception affinity matrix is constructed to associate data using multiple information such as object intersection ratio,Euclidean distance,appearance information,and object attributes of the object to increase the matching rate of trajectory and detection,reduce the identity switching frequency of the object,and improve the tracking performance.Evaluated on the KITTI tracking benchmark dataset and compared with existing advanced tracking methods,the high-order tracking accuracy HOTA and multi-object tracking accuracy MOTA reach 76.94%and 88.12%,respectively.Experimental results show that the proposed method can accurately track multiple objects in complex scenes with more advanced tracking performance. |