Font Size: a A A

Research On Visual Data Generation For Autonomous Driving

Posted on:2023-05-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z B SongFull Text:PDF
GTID:1522307061473634Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Data-driven machine learning algorithms are widely used in the research of autonomous driving(AD)in terms of environmental perception,navigation,and control.However,such learning-based models are heavily attributed to large-scale and well-labeled training data.Highquality manual labeling is a costly and time-consuming task.Therefore,one possible solution is to use simulation systems to generate labeled data for AD.The other way is real-data-driven approaches for enhancing real data and generating new data.In both ways,a large number of simulated data could be used as training data for machine learning algorithms.Particularly,the generation of visual data,represented by images and point clouds,is a basic task to realize the interaction between AD and driving scenarios.The focus of this dissertation is on the real-data-based visual data generation for AD.Based on collected data,methods for enhancing existing data and synthesizing new data are mainly discussed below.Especially,key tasks in the real-data-based simulation pipeline are the focuses of research,including 3D reconstruction,moving object points removal,sparse depth completion,and novel view synthesis.Since the monocular camera and 3D Li DAR are typically equipped for AD cars,the research is conducted based on images and point clouds.Firstly,for the purpose of reconstructing a globally unified point cloud map of a specific scenario,a deep direct pose estimation algorithm and a moving vehicle removal algorithm are proposed.Subsequently,in order to enhance the sparse measurements of Li DAR,a self-supervised depth completion method is proposed.At last,a novel view synthesis pipeline is developed to achieve any view synthesis in the reconstructed point cloud map.Specific contents and research results of this article include the following aspects:(1)Since traditional visual odometry is often invalid under low texture and lighting change conditions,the assumption of feature-metric consistency is proposed in this dissertation.Based on this,a new direct visual-Li DAR odometry framework is developed.In this framework,a feature pyramid is firstly extracted from image inputs with a well-designed deep neural network.Then relative pose of two consecutive frames is estimated with the Gauss-Newton optimization method by minimizing the feature-metric error of aligned pixels.During the optimization steps,pose results calculated from high-level layers of the feature pyramid are propagated as the initialization for low-level layers.Finally,with the sliding window strategy,the feature-metric odometry is built.Compared with conventional methods,the proposed deep direct method is significantly more robust as the front-end for visual odometry,which shows local smoothness and low drift.(2)In 3D point cloud reconstruction,moving vehicles could have a potentially large accumulation of undesirable artifacts.In this dissertation,an end-to-end pipeline is proposed for monocular vehicle distance and velocity regression,as well as its application on moving vehicle removal in the reconstructed map.A distance and velocity regression model is derived by aggregating cues from both spatial and temporal images,including detection geometric cues,deep feature cues,and optical flow cues.A vehicle-centric sampling mechanism is also proposed to alleviate the effect of perspective distortion in the motion field.Afterwards,moving vehicles are identified according to their absolute velocity.Extensive qualitative and quantitative experiments have been conducted on several AD datasets to reveal the effectiveness of the method.(3)In this dissertation,a self-supervised depth completion framework is proposed to densify the sparse Li DAR measurements.Incorporating the feature pyramids and relative poses from above deep direct odometry,the self-supervised framework introduces the feature-metric loss for training dense depth.Besides,relative poses provide accurate matching points for the calculation of the feature-metric loss.After training under the framework with sequential data,the depth completion network can directly induce the depth map from the input image and the sparse Li DAR depth.Experiments exhibit the capability of the network for dense depth generation in terms of scene detail recovery and object edges.(4)In this dissertation,a new paradigm of view synthesis is proposed,which describes a direct projection of point clouds to images.It is comprehensively discussed that the view synthesis results are better when training directly on the point clouds,rather than on the sparse depth images.Furthermore,the Refine Net is proposed to supply more details and suppress unwanted visual artifacts.Extensive experiments are carried out on several different datasets,including indoor and outdoor scenes,and on several different sources of point cloud data.Notably,the proposed approach can synthesize realistic RGB images from relatively sparse point clouds.
Keywords/Search Tags:Point Cloud and Image Fusion, Localization and Mapping, Depth Completion, Image Generation
PDF Full Text Request
Related items