Font Size: a A A

Research On 3D Object Detection And Tracking Algorithm Based On Point Cloud And Image Fusion For Intelligent Vehicle

Posted on:2024-07-08Degree:MasterType:Thesis
Country:ChinaCandidate:J W JiangFull Text:PDF
GTID:2542307064983279Subject:Vehicle Engineering
Abstract/Summary:PDF Full Text Request
3D object detection and tracking,as the core task in the autonomous driving perception module,needs to detect the position,size,orientation and speed of the object in the 3D space while recognizing the object category,and output the historical trajectory of the object,which provides a sufficient basis for subsequent prediction and decision-making.Li DAR and camera,as two common sensors in the intelligent vehicle perception system,can provide different modal supports for 3D perception tasks.The point cloud generated by Li DAR can provide accurate depth and geometric information,but the point cloud is sparse and can only provide shape information,so the description ability is insufficient when facing objects that are far small or missing shapes.The image generated by camera are regular and dense,which can provide rich texture and color information,but the depth information is lost,making it difficult to achieve high-precision three-dimensional positioning.It can be seen that it is difficult for a single sensor to meet the intelligent vehicle’s demand for the accuracy and robustness of the perception system.Multi-modal fusion perception based on multisensor is the future development trend and the current research hotspot.Traditional sensor fusion mainly uses result-level post fusion,which does not fully use the advantages of fusion.Therefore,this paper is dedicated to the deep fusion of point cloud and image to fully exploit the advantages of sensor fusion to improve the accuracy of 3D object detection and tracking.The main research content of this paper includes the following four parts:(1)Research on 3D object detection algorithm based on point cloud and image fusion at input levelTo address the problem that it is difficult to detect far small objects due to sparse point cloud,this paper proposes a pre-fusion method,Mask Densing,which uses image instance segmentation results to densify point cloud and enhance semantics at input level.In this paper,the real point cloud is first projected onto the mask image and the virtual points are collected in the effective mask part.Then,the nearest real point is taken as the virtual points corresponding depth,which is used to generate virtual point cloud according to the projection matrix between the lidar and the camera to densify the foreground point cloud.After that,the category information corresponding to the mask is added to the point cloud in the form of one-hot encoding to achieve semantic enhancement.To overcome the information loss of hard voxel encoding and reduce the amount of computation,this paper introduces dynamic voxel encoding and adds voxel geometry encoding to balance category and geometric dimension.The results of experiments on the Nuscenes dataset show that the input levle fusion algorithm proposed in this paper comparing with the baseline algorithm can improve all kinds of detection results,especially for the long distance and small objects.(2)Research on 3D object detection algorithm based on point cloud and image fusion at feature levelAiming at the problem that the input level fusion algorithm of point cloud and image cannot fully utilize the rich contextual features of image.This paper proposes Deform Fusion,a point cloud and image feature level fusion algorithm based on transformer architecture.In the point cloud branch,a hotspot decoder is proposed to generate hotspot to initialize object query with classification assistance,which can effectively solve the problem of slow convergence of transformer.In the process of feature level fusion,inspired by deformable DETR,a deformable spatial constraint feature aggregation module is proposed,which use the initial detection results of point cloud branch as the reference points to adaptively fuse image feature with the learnable offsets to refine the initial detection results.Experiments on the nu Scenes dataset show that the feature level fusion algorithm proposed in this paper is superior to the input level fusion algorithm.The ablation experiments also show the advantages of transformer architecture and point cloud and image feature level fusion.(3)Research on 3D object detection algorithm based on point cloud and image fusion at input level and feature levelAiming at the problem that the sequential point cloud and image feature level fusion algorithm proposed in this paper is too dependent on the initial detection results of the point cloud at first stage.Considering that the input level fusion algorithm is easy to nest,combining the advantages of them,this paper proposes Mix Fusion,which fuse featue of point cloud and image at input level and feature level at the same time.In the first stage,the input level fusion algorithm is used to improve the recall of point cloud detection,and then the detection results are further optimized through the second stage feature level fusion.Experiments on the nu Scenes dataset show that the hybrid fusion algorithm is superior to the pure input layer fusion algorithm and the pure feature layer fusion algorithm,which verifies the effectiveness and superiority of the hybrid fusion algorithm proposed in this paper.(4)Research on 3D object tracking algorithm based on point cloud and image fusionAiming at the problem of identity switch in previous 3D object tracking tasks,this paper uses the high-precision 3D target detection results of Mix Fusion and combines the point cloud and image fusion features to propose a multi-modal fusion 3D object tracking algorithm Deep Track3 D.In the motion model,Kalman filter algorithm is adopted and threedimensional extension is carried out.In the data association module,a cascade matching strategy is proposed,which combine point cloud and image semantic fusion feature and bounding box geometric features to generate cost matrix.Meantime,geometric matching strategy based on 3D GIo U is proposed.A two-stage lifecycle management strategy is proposed in the lifecycle management module to keep track of continuity.Experiments on the nu Scenes dataset show that the point cloud and image fusion 3D object tracking algorithm proposed in this paper is superior to the previous algorithm.
Keywords/Search Tags:Intelligent Vehicle, Multimodal Fusion Perception, Point Cloud, Image, 3D Object Detection, 3D Object Tracking
PDF Full Text Request
Related items