| With the development of sensors technology,the medium from which computers perceive the world has gradually evolved from 2D,e.g.images,to 3D,e.g.depth images and 3D point clouds.A 3D point cloud is a set of discrete spatial points distributed in a three-dimensional space which describes the shape of an object.It is usually obtained by Li DAR sensor,3D laser scanner,and structure from motion algorithm,etc.As a kind of unordered and sparse data structure,the point cloud is invariant to the permutation,translation,and rigid rotation.Recently,the deep learning-based methods such as Convolutional Neural Network(CNN)has become one of the state-of-the-art approaches for solving problems in the fields of computer vision and graphics.However,CNN is more suitable for processing structured data such as images and video.Thus,it is a hot topic to take the recent research progress from deep neural network for the intelligent perception of 3D point cloud data.This dissertation focuses on the research of intelligent perception algorithms for 3D point cloud data.With the help of deep learning and other technologies,it completes three tasks,i.e.,classification,detection,and depth completion,for the 3D point cloud.Starting from analyzing the characteristics of existing approaches,the dissertation exploits the inherent property of the data and comprehensively considers the advantages and disadvantages of the existing approaches.Some novel data representation methods,machine learning models,and training strategies are proposed.Firstly,the dissertation draws on the idea of fusing global and local information from CNNs in the image processing domain.In specific,the smallscale neighborhood feature of each point and the point-wise feature of the overall point cloud are extracted.By utilizing the convolutional layer to perform feature fusion,the proposed method realizes the accurate classification of point clouds.Subsequently,by explicitly considering the uneven distribution of Li DAR point clouds,the voxel representation and sparse three-dimensional convolution are proposed for data processing.In specific,the task of real time road scene 3D object detection is achieved.Finally,by observing the uneven and sparse distribution of the point cloud,we speculate that this characteristic is one of the main factors that restrict the performance of object detection and reconstruction.Based on this assumption,the image-guided point cloud completion methods are proposed in this dissertation.To verify the effectiveness of these methods,extensive experiments are conducted on several publicly available datasets.In summary,the main contributions and innovations of this article can be elaborated from the following aspects:1)This dissertation proposes a permutation invariant dual pathway deep network for point cloud classification.The structure of this network is based on a permutation invariant function.The extension of point-wise feature and the extraction of local feature are achieved by its point-wise and neighbor-wise representations,respectively.The experiments on Model Net40 demonstrate that our model achieves state-of-the-arts accuracy with 0.8 million parameters only.Other experiments are conducted to verify the effectiveness of this design from two perspectives,i.e.network structure and feature visualization.2)This dissertation proposes a deep learning framework for object detection from Li DAR point cloud under road scene.A sparse voxel data representation is adapted for organizing the unorder,sparse,and irregular point cloud.Furthermore,a single-stage 3D object detection deep network is constructed by combining sparse 3D convolution and traditional 2D convolution.This model requires merely about 28% memory capacity and brings 2x speedup comparing to the fully traditional convolution setting.In the meanwhile,the proposed 3D bounding box representation and incremental data augmentation increase the accuracy of direction angle prediction and detection by about 3% and 9%,respectively.The experiments on public dataset demonstrate that this framework achieves40 FPS detection speed and advanced detection accuracy.3)The problem of image guided Li DAR point cloud depth completion is achieved in this dissertation.The active acquisition processing of the Li DAR sensor makes the density of its generated point cloud inversely proportional to the distance of objects,which brings difficulties to tasks such as object detection and 3D reconstruction.The related works have proven that the missed depth information can be inferred by exploiting the rich texture and semantic information in the color image using CNN.To achieve this goal,two CNN-based models are proposed in this dissertation.The first model proposed a multiple modality fusion scheme.The sparse invariant convolution is used to construct this network.It extracts the depth feature of point clouds at different distances with various densities.Besides,several mask-aware network modules are used to recovery high precision dense depth map.Noticing the insufficiency of a sparse invariant convolution,the second model proposes a more effective depth-aware non-local convolution operation to process the sparse depth map.Additionally,it utilizes a symmetric co-attention module to extract and fuse features from different modalities.The second model reduces network parameters and RMSE by about 27% and 4%,respectively.Finally,the dissertation verifies the robustness of these models under various density patterns of the point cloud.In general,this dissertation studies several intelligent perception methods for the 3D point cloud.Specifically,tasks of classification,detection,and completion are completed with advanced performance and inference speed.Besides,the proposed point cloud feature extraction method,sparse data processing algorithm,and multi-modal data fusion scheme also have certain theoretical and practical value. |