Font Size: a A A

Research Of LiDAR Segmentation Based On Multi-Modal Fusion

Posted on:2024-09-01Degree:MasterType:Thesis
Country:ChinaCandidate:Q YuFull Text:PDF
GTID:2568307067494544Subject:Electronic information
Abstract/Summary:PDF Full Text Request
In the field of autonomous driving scene perception,most open-source datasets have multimodal data,mainly covering two modalities:LiDAR point clouds and RGB camera images.The advantage of LiDAR lies in its ability to accurately measure the threedimensional coordinate information of objects in the current scene.However,its drawback is that the acquired data is discretely distributed in three-dimensional space and is very sparse,essentially containing only the positional information of the coordinates themselves.In contrast,RGB cameras provide abundant color and texture information,and structured data is more conducive to network learning,but they lack crucial depth information.Therefore,how to better utilize and fuse the data from these two modalities has become the mainstream of current research.Among many perception tasks,segmentation tasks can provide more detailed scene understanding,providing category information for each point,and in panoptic segmentation,providing additional instance information.This allows all objects in the scene to be recognized and classified,and better handles occlusion and overlap issues between objects.However,the development of segmentation tasks is limited by the high cost of data annotation.Training effective segmentation models requires a large amount of point-level annotated data,and the level of detail required for data annotation is very strict,making the annotation process time-consuming and requiring a significant amount of labor cost.Therefore,based on the above situation,this paper proposes two novel solution algorithms and a semi-supervised solution from the perspectives of multimodal fusion,panoptic segmentation instance clustering,and reducing annotation costs.The main contributions of this paper are as follows:·A multi-level multimodal fusion method for 3D semantic segmentation,which proposes a point-level detail feature fusion strategy,projecting the point cloud onto the image through the transformation matrix of the LiDAR and RGB camera to obtain the image modality features of the point cloud.A multi-head self-attention mechanism structure based on linear computational complexity is proposed to achieve the fusion of high-level semantic features while controlling the computation within an acceptable range and maintaining its original feature learning capabilities.Experiments on the open-source dataset nuscenes have demonstrated the effectiveness of the various modules of this method,achieving the best results on the validation set.·A high-precision multimodal panoptic segmentation method based on the above multimodal semantic segmentation method,which carefully explores the improvement effect of panoptic labels on the semantic segmentation backbone network and designs a more reasonable instance clustering scheme.For some large objects that are difficult to cluster,a secondary clustering module is proposed,which reuses image detail texture information to achieve the merging of center point prediction.Experiments on the open-source dataset nuscenes have demonstrated the effectiveness of the various modules of this method,achieving the best results on the validation set and the third place on the test set leaderboard.·A multimodal semi-supervised semantic segmentation scheme,which proposes using lower-cost image object detection bounding boxes instead of point-level semantic annotations.By projecting and using heatmap mechanisms,the object detection bounding box information is fed back into the point cloud pseudo-labels,and the principle of knowledge distillation is used to make the corrected pseudo-labels have a boosting effect on model training.
Keywords/Search Tags:Lidar point cloud, multi-modal, semantic segmentation, panoptic segmentation, Semi-Supervised Learning
PDF Full Text Request
Related items