With the development of 3D sensor technology,3D point cloud data is widely used in real scenes,laying a solid foundation for accurate environmental perception and understanding of 3D scenes using computer technology,which is an important support for the construction of real 3D China,land resource management and monitoring,and other major national needs.As a key task in 3D scene understanding,semantic segmentation of 3D point cloud scenes aims to give semantic labels to each point in 3D point cloud scenes through the method of computer vision and pattern recognition,so that it can accurately describe the category of objects in the scene,which is a key technology for virtual reality representation of the objective world and a hot spot for academic research,and has a broad application prospect in the fields of automatic driving,smart city and virtual reality.It has broad application prospects in the fields of autonomous driving,smart city and virtual reality.Therefore,it is of great theoretical value and practical significance to study the key technology of semantic segmentation of large-scale3D point cloud scenes.In the past,point cloud semantic segmentation methods based on manual design relied on the design and selection of manual features,and their segmentation results were greatly influenced by researchers’experience,which had problems such as inability to describe higher-order semantic information and difficulty in adapting to complex large-scale realistic scenes.In recent years,deep neural networks,with their powerful feature learning capability,have attracted extensive academic attention and achieved remarkable results in the fields of natural language processing and 2D image understanding,but the exploration of disordered,inhomogeneous and unstructured 3D point cloud data is still at a relatively initial stage of research.This thesis will focus on the key technology of semantic segmentation of large-scale point cloud scenes based on deep learning to provide feasible technical directions and guidance for the application of intelligent understanding of large-scale 3D point cloud scenes,and further promote the development of deep learning technology.The main research work of this thesis is as follows:1.3D point cloud data enhancement method based on graph neural network for 3D point cloud semantic segmentation.Due to the difficulty of 3D point cloud data labeling and the huge amount of data for large-scale 3D point cloud scenes,it is difficult to obtain sufficient training samples with accurate semantic labeling for effective training of data-driven deep neural networks in practical applications.To address this problem,this thesis proposes a random graph method for point cloud data enhancement.First,a simple and efficient random graph method is designed in this thesis.When constructing the adjacency graph of a point cloud based on the K-nearest neighbor search algorithm,the topology of the adjacency graph is processed by randomly discarding the neighboring edges,and different adjacency graphs are used for data augmentation in each training round.Secondly,a simplified graph convolution calculation is designed to reduce the computational overhead in the graph convolution calculation process.Finally,a feature extraction module is designed to efficiently capture the local context by aggregating the spatial and semantic information of the points.Since the designed data enhancement method does not require input of additional training data,it is suitable for the semantic segmentation task of large-scale 3D point cloud scenes with large data volume.In this thesis,we perform experimental validation on S3DIS for indoor scenes and Toronto3D for outdoor scenes,and demonstrate that the proposed method can improve the segmentation effect when the training samples are not sufficient.2.3D point cloud semantic segmentation method based on multi-dimensional local information encoding.In large-scale 3D point cloud scene data,the spatial geometric structures of some objects do not differ much from each other,making it difficult to segment effectively based on spatial geometric information alone,such as walls,windows and wood panels commonly found in indoor scenes,and roads and lane lines commonly found in outdoor scenes.In this thesis,a multidimensional local information encoding method is proposed to aggregate geometric,color and semantic information of neighboring points to the central point.Secondly,this thesis designs a cross-information encoding method that enables the neural network to fully utilize the spectral color information of point clouds to solve the difficulty of existing methods in recognizing scenes with insignificant differences in geometric structures.After that,a joint pooling method is designed to enable the neural network to take into account both the most significant local features and the whole local neighborhood,which enhances the learning ability of the neural network for local information.Finally,this thesis designs a residual dense connectivity module,which enables the neural network to focus on the information of different scales of perceptual fields at the same time,further enhancing the network’s perception of local regions.This thesis verifies that the proposed method can significantly improve the segmentation of local regions with insignificant differences in geometric structure on indoor scene S3DIS,outdoor scene Toronto3D and Semantic3D.3.3D point cloud semantic segmentation method based on local-global information enhancement.Since the features of points distributed at the object boundaries can make it difficult for the network to segment the object boundaries accurately because they are interfered with each other;at the same time,it is difficult for the network to capture the global information distributed in the large-scale scene.To address these problems,this thesis designs a local adaptive feature enhancement method for adaptively learning the similarity of central and neighboring points,and using the learned similarity weights to constrain the network’s focus on different boundary points to enhance the local context.Secondly,based on the VLAD(Vector of Local Aggregated Descriptors)module in 2D image retrieval,this thesis designs an integrated global feature enhancement method to learn more integrated and comprehensive global description vectors from multi-level local features.Finally,this thesis designs a locally optimized aggregated loss function to speed up the convergence of the network and effectively optimize the segmentation boundary by constraining the adaptive weights in the local adaptive feature enhancement method.This thesis verifies that the proposed method can significantly improve the boundary segmentation on indoor scene S3DIS,outdoor street scene Toronto3D and outdoor urban scene Sensat Urban.4.Multi-Scale fusion and deep supervised neural network architecture based on multi-scale fusion for 3D point cloud semantic segmentation.Exploring multi-scale features is crucial to solve complex scale variations in point clouds.Large-scale 3D point cloud scenes contain objects of different scale sizes at the same time,such as large-scale buildings and small-scale streetlights commonly found in outdoor scenes,and it is difficult for the network to effectively learn local details of objects of different scales at the same time.To address this problem,this thesis combines the design ideas of encoder-decoder architecture and deep supervision to design a new,general nested backbone network architecture.First,the thesis provides an in-depth analysis of the evolution of the U-Net architecture in the 3D point cloud domain and identifies the most fundamental coder-decoder architecture,U-Net L~1,as a suitable component for the fine-grained 3D point cloud semantic segmentation task.Second,the thesis proposes a general and effective segmentation framework that minimizes the semantic gap between encoder and decoder by stacking multiple U-Net L~1 sub-networks,and learns multi-scale feature maps of down-sampled point clouds with different resolutions repeatedly through multiple sub-networks,with local feature maps able to flow freely laterally,upward or downward.Finally,the thesis proposes a multi-level depth supervision method that introduces a multi-level depth supervision mechanism at each decoding node,making the gradient propagation of the network smoother and easier to be trained.The thesis verifies on the baseline models Point Net++,Rand LA-Net,and BAAF-Net,with the large-scale 3D point cloud scene benchmark datasets S3DIS,Toronto3D,and Sensat Urban,that the proposed architecture can achieve consistent and significant improvements in semantic segmentation results. |