Scene understanding is an important branch of computer vision,which aims at enabling computers to understand scenes like human beings,is the basis of environmental perception and vision navigation of modern intelligent systems.Scene segmentation that provides a complete and fine-grained understanding of the scene is the key to scene understanding.At present,it has been widely applied in autonomous driving,intelligent robots,and augmented reality,thus having significant values in both research and application.Though classical image-based scene segmentation has achieved resounding success,there are some shortcomings and challenges in the current research.On the one hand,because 2D images cannot capture the structural information of the scene and are easily disturbed by natural environments and imaging conditions,they have essential limitations in natural scenes,especially for outdoor scenes.On the other hand,compared with 2D images,3D point clouds contain richer structural information of the scene,which is conducive to scene segmentation.However,due to their data characteristics,many traditional algorithms designed for 2D images,especially deep learning methods,cannot be directly applied for 3D point clouds.To this end,this thesis conducts researches on 3D point clouds-based scene segmentation from multi-modal data fusion and point cloud feature learning.First,in view of the successes of image-based scene segmentation,fusion models of 2D images and 3D point clouds are built to make up for the shortcomings of a single sensor to improve the robustness of scene segmentation.Second,feature encoding and decoding methods in point cloud segmentation have been studied,and feature extraction operators for point clouds are constructed.Efficient point cloud semantic segmentation is archived by improving the representation ability of point cloud features.Specifically,the main work of this thesis is as follows:A road segmentation model based on fusion hierarchical CRF is proposed to improve the robustness of road segmentations.Multiscale features of images can improve the receptive field of the model,and the spatial structure features of point clouds can make up for representation limitations of pixel features.To efficiently combine them,an unsupervised segmentation is first employed to generate multiscale image superpixels,and a hierarchical CRF based on images is built.Correspondingly,the point cloud is projected to the image plane,and a hierarchical CRF based on point clouds is then built.Finally,pixels and points in the base layers of the two hierarchical CRFs are connected to form a fusion hierarchical model.Experiments on the KITTI road benchmark show that the proposed method can effectively eliminate the interference of shadows and lights in road scenes.A road segmentation network based on spatial propagation and transformation fusion is proposed to enhance the accuracy of road representation.To take advantage of deep networks and multi-modal fusion,a simple but efficient lightweight network is first devised for unordered and sparse point clouds to obtain a rough representation of the road area.Then,an equal resolution convolution block is employed to capture low-level features of the image,which are used to generate diffusion coefficients of the joint anisotropic diffusion-based spatial propagation model.Under the guidance of the learned low-level image features,the rough representation is diffused in both perspective and bird views via the spatial transformation in the network.Finally,the diffusion results in the two views are integrated to generate a fine-grained representation of the road area.Without any additional data augmentations and pretraining,the proposed method obtained competitive results on the KITTI road benchmark.A point cloud segmentation algorithm based on continuous CRF graph convolution(CRFConv)is proposed,which enhances the locating ability of the network thereby improving segmentation performance.The CRF is usually formulated as a discrete model in label space to encourage label consistency,which is actually a kind of postprocessing.Instead,this thesis reconsiders the CRF in feature space of point clouds,because it can well capture the structure of features to fundamentally improve their representation ability,rather than simply smoothing labels.First,the solution process of the CRF in feature space of point clouds is reformulated as a message-passing graph convolution.Then this convolution is embedded in decoding layers of the segmentation network to restore details of high-level features that are lost in the encoding stage.Experiments on various point cloud segmentation benchmarks show the effectiveness of the proposed method.A point cloud learning algorithm based on adaptive GMM convolution(AGMMConv)is proposed,which improves the adaptability of kernels to local geometric structures.The success of CNNs is mainly attributed to the(translation)invariance and local pattern matching effect of convolution kernels.Inspired by this,a GMM is proposed to represent discrete convolution kernels,where mean vectors are locations of kernel points,and covariance matrices determine the shape of each kernel.Meanwhile,the GMM learned from local observations is also a probability distribution of the local geometric surface,which makes it adaptive to local geometric structures.The proposed convolution is essentially invariant to permutation and translation.Besides,potential rotation invariance can be induced from the probability representation.In convolution,a series of shared weights are associated with each GMM kernel point to match local patterns of the point cloud.Experiments on object-level and scene-level point cloud datasets demonstrate the effectiveness and robustness of the proposed method. |