| With the development of 3D sensor technology,the difficulty of obtaining 3D data is gradually reduced.The complex real-world environment can be constructed from the 3D data form of point cloud,which stores all objects in the space as 3D points,and each point contains multi-dimensional information to record the original information of the object.The point cloud semantic segmentation task needs to establish an accurate 3D mask for each object in the space and generate a point-level classification for all spatial points.As a crucial technology for spatial scene understanding,it has attracted great attention in various fields such as autonomous driving,intelligent robots,and computer vision in recent years.At present,the neural network with codec structure has successfully completed the pixel-level classification requirements in the field of 2D image processing,but it still needs to be developed in 3D data processing.When dealing with the irregular 3D data form of point cloud,the network needs to design various structures to extract features effectively.(1)The point cloud has rotation invariance,and the relative positions of all points in spaces remain unchanged after rigid transformation,which requires the network to fully consider this property to perform point-level classification tasks accurately.(2)The point cloud contains many points,which is difficult to process directly.Therefore,the encoder needs to down-sample the point cloud layer by layer to improve the efficiency of feature extraction.However,multiple down-sampling will cause the loss of spatial geometric information.(3)For the accurate task of semantic segmentation,the features of each hidden layer of the decoder are not supervised,which will lead to incorrect segmentation results obtained by multiple interpolation operations.The network proposed in this paper constructs various structures to solve the above problems.(1)For the rotation invariance of point clouds,this paper designs a local geometric encoding module to explicitly extract rotation-insensitive feature representations by exploiting the relative angular relationship between spatial points represented by polar coordinates.In addition,the module also calculates local normal information to enhance local semantic information and enrich the geometric features of point clouds.(2)For the problem of geometric information loss,this paper introduces a multi-decoder ensemble module to restore the features of each layer of the encoder and uses the fusion mechanism to obtain a unified segmentation result containing multi-scale information.(3)For the problem of unsupervised hidden layers,this paper uses contrastive learning to normalize the decoder output,which generates multi-hot labels from the hidden layers of the encoder to supervise the predicted multi-hot labels from the hidden layers of the decoder.Furthermore,contrastive learning can also be combined with multi-decoder ensemble module to form multi-scale supervised network for optimizing accurate spatial segmentation results.To verify the effectiveness of the methods proposed in this paper,this paper shows the segmentation results on different public data sets,and compares them with various other networks,and conducts sufficient ablation experiments for the different methods proposed in this paper.After experimental verification,the method proposed in this paper has excellent point cloud semantic segmentation ability and can obtain accurate point-level classification masks in the face of a variety of complex spatial scenarios. |