Font Size: a A A

Research On Key Technologies Of Multi-sensor 3D Environment Perception Systems For Autonomous Driving

Posted on:2021-01-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:J R WangFull Text:PDF
GTID:1362330602959985Subject:Mechanical and electrical engineering
Abstract/Summary:PDF Full Text Request
Environmental perception as a key part of autonomous driving is the guarantee of driving safety and intelligence.During vehicle driving,the advanced three-dimensional(3D)environment perception system can identify,segment,and locate 3D objects that affect driving safety(such as surrounding roads,vehicles,pedestrians,and obstacles).The system obtains accurate object information(including position,size,driving direction,geometric shape,and category),which provides a sufficient basis for subsequent decision-making and operational control.With a single detection method or sensor,it is difficult to perform a robust and fast perception of complex scenes.Whereas the use of multiple sensors to complement one another can lead to more comprehensive,accurate,and environmentally compatible information regarding the surrounding environment,meeting the requirements of anti-interference ability and detection accuracy.In recent years,deep learning has made breakthroughs in information processing and recognition.The performance of segmentation and detection algorithms based on deep neural networks is outstanding,which provides new methods and ideas for the study of multi-sensor 3D environment perception systems.In this paper,we use a monocular camera and 3D LIDAR,combined with deep neural networks,to conduct theoretical analysis,method research,implementation,and verification of key technologies such as multi-sensor calibration,3D point clouds and RGB image fusion,3D object detection,and semantic segmentation.The main research contents are as follows.(1)Research on multi-sensor calibration of 3D environment perception systems.For the driving environment perception system using 3D LIDAR and monocular RGB camera as the sensing device,we first analyze the principles and methods of intrinsic and extrinsic calibration of heterogeneous sensors.Second,we design a series of calibration schemes and set up an experimental system.Third,we calculate the rigidbody transformation and projection matrices of LIDAR and camera with the help of Matlab,Robot Operating System,and Autoware,to unify the coordinate systems and build mapping models.Finally,we complete the spatial alignment and registration of 3D point cloud data and RGB images,ensuring the precision of subsequent information fusion and 3D perception.(2)Research on multi-sensor 3D object detection based on multi-stage complementary fusion.Difficulties in environmental perception include insufficient initial feature extraction of source data;simple and inefficient multiple modalities matching and fusion;the fact that recognition,segmentation,and detection performance are susceptible to distance,deformation,scale variety,overlap,occlusion,etc.,under complex road traffic situations.To solve this,we construct a bottom-up knowledge and data joint driven network,taking LIDAR point clouds and RGB images as inputs.It consists of three stages: preprocessing,preliminary prediction,and fine regression,sequentially performing data analysis,feature extraction,proposal generation,and bounding box refinement.We use a multi-stage fusion strategy to design a series of targeted fusion methods: pre-fusion,anchor-fusion,and proposal-fusion,to maximize the advantages of multimodal data.Among them,an innovative RGB-Intensity representation is proposed to encode the reflection intensity onto the input image to strengthen the representational power;designing and introducing the attention module PEA to adaptively determine the “contribution” of different modality features to the network.Experiments show that our method can predict the objects' category,3D location and size,and orientation accurately and nearly in real-time,achieving excellent results on the public KITTI benchmark.(3)Research on cascade enhancement-based 3D object detection method.To address the difficulty of low-precision detection of small objects(such as pedestrians and cyclists)in complex urban scenes,we propose a cascade enhancementbased 3D detector.First,referring to the image segmentation method,atrous convolution and atrous spatial pyramid pooling are introduced in our feature learning network to enhance the details and semantics of the extracted feature representations.Second,a cascade iterative strategy is adopted,extending the typical two-stage detection framework to a three-stage network consisting of a region proposal subnet,a weak detector,and a strong detector.These stages are trained with increasing Io U thresholds to be sequentially more selective against close false positives,leveraging the output of the former detector with a good distribution to train the next higher-quality detector.This progressively deeper detection method can “lock” small objects,reduce the rate of missed detection and false detection,and improve location accuracy.A large number of experiments on the public dataset indicate that our method is superior to the most advanced methods of the same type.(4)Research on 3D object segmentation and detection based on key-points densification and multi-attention guidance.Point clouds are sparse and have uneven density distribution.Hence,it is difficult to describe small/distant target contours and to express the differences between similarly shaped objects.We propose a novel 3D detection and segmentation multi-task network,which takes point clouds as the main inputs that can be optionally complemented by images.It consists of three parts: 3D foreground segmentation and proposal generation;key-points densification;3D semantic segmentation and box refinement.Detection and segmentation tasks share most of the parameters and supervise and assist one another.In the full-point and region-of-interest feature extraction network,lightweight point-wise attention(PA)and channel-wise attention(CA)modules are respectively embedded to strengthen the “skeleton” and “discriminability” information,extracting representative and targeted feature representation.The proposed monocular image-based pseudo-point cloud supplementation method uses the distance preference strategy and K-means clustering to implicitly use the color and texture of the image to enrich the sparse target information and balance the density distribution.The experiments on the benchmark dataset demonstrate the outstanding scalability of our proposed method and its excellent performance in long-range and small class detection.
Keywords/Search Tags:Autonomous Driving, Environment Perception, 3D Object Detection, 3D Semantic Segmentation, Multi-sensor Calibration, Multimodal Data Fusion, Attention Mechanism
PDF Full Text Request
Related items