Font Size: a A A

Research On Point-voxel-based LiDAR 3D Object Detection

Posted on:2023-05-19Degree:MasterType:Thesis
Country:ChinaCandidate:N SongFull Text:PDF
GTID:2568306767463454Subject:Photogrammetry and Remote Sensing
Abstract/Summary:
With the rapid development of 3D perception devices such as structured light sensors,Li DAR and Kinect,there has been more and more research in related fields,and practical applications,such as virtual reality,augmented reality,robotics and autonomous driving,have also received more and more attention.As a crucial task of3 D scene perception and understanding,3D object detection plays an important role in the above-mentioned extensive applications,and is the cornerstone of many downstream tasks,such as motion and tracking.Compared with indoor detection,sophisticated street scenes and emergencies in Li DAR 3D object detection tasks,and consequently related models have to meet stricter efficiency and accuracy requirements.At present,Li DAR 3D object detection mainly relies on point clouds data scanned by Li DAR,which is in a round view and sparse.Existing Li DAR 3D object detection methods can be generally divided into two categories according to the perception ways,namely,the methods based on voxel perception and those based on point clouds perception.Voxel representations contribute to locating objects efficiently and quickly,while point clouds representations can describe spatial relationships within objects to assist to refine detected objects.This thesis aims to utilize and combine the characteristics and advantages of these two representations,and propose a novel twostage Li DAR 3D object detection network,called Joint Point-Voxel Network(JPV-Net).Specifically,the network framework includes the dual encoder-fusion decoder proposed in this thesis,which consists of two encoders with different functions and designs,and a feature fusion decoder.The former are used to extract the rough voxels features of the3 D scene and point features rich in geometric context,while the latter adopts an attention mechanism to fuse the two features from coarse to fine and gradually propagates features back to the original resolution.Besides,in order to further explore the characteristics of the voxel CNN(Convolutional Neural Network)and the point cloud perception network,this thesis also designs two Io U(Intersection over Union)estimation modules for the proposal and refinement stages,both of which can effectively alleviate the disparity between object localization and classification confidence.In addition,although there are a large number of 3D scene samples gathered by driving vehicles,the cost of labeling 3D scenes is much higher than that of images and other types.Therefore,for Li DAR 3D object detection tasks,there are a large number of unlabeled samples rich in valuable information.To make full use of these samples,this thesis introduces a semi-supervised learning method to improve existing networks.In this thesis,multiple benchmark datasets,such as KITTI dataset and ONCE dataset,are used to evaluate the JPV-Net,and experimental results not only validate the effectiveness of the network,but also demonstrate the rationality of semi-supervised learning.
Keywords/Search Tags:autonomous driving, 3D object detection, semi-supervised learning, multi-perception fusion
Related items