3D scene understanding is an important research direction in the fields of computer vision and machine learning.It aims to enable computers to perceive and understand the real world like humans do,using 3D sensor data and deep learning algorithms.It is the foundation of computer-real world interaction and the core technology for applications such as autonomous driving,robot navigation,virtual reality,and augmented reality.In general,scene understanding can be interpreted as the task of extracting semantic information from the geometric scene,i.e.semantic segmentation.For this task,the current mainstream solutions at home and abroad are divided into two types: end-to-end semantic segmentation of the entire scene and simultaneous real-time reconstruction and semantic segmentation.This thesis combines the research status at home and abroad,and conducts relevant research on real-time semantic segmentation tasks synchronized with reconstruction and semantic segmentation,as shown below:(1)A real-time semantic segmentation system based on scene geometry pre-segmentation is proposed,aiming to extract semantic information on an object-by-object basis,in order to solve the problem of poor extraction effects on object edge semantic information in the two mainstream solutions mentioned above.This work continuously detects the main plane information of the indoor scene in real-time during the reconstruction process,in order to obtain the point cloud objects in the current scene,and performs geometric clustering segmentation to ensure that objects and object sets are mutually independent.After the reconstruction is completed,semantic information is extracted to achieve object-level scene semantic segmentation and improve the extraction effect of semantic information on object edges.(2)This work proposes two solutions for the main plane extraction task of point cloud scenes: offline and real-time.The offline solution takes the entire point cloud scene as input,reduces the calculation cost by super voxel segmentation,performs geometric clustering on the segmentation results,and then uses the RANSAC algorithm to extract the part belonging to the main plane in the clustering result.The real-time solution is to detect the plane information in real-time by analyzing the depth map based on the RGB-D sequence input in the reconstruction system.Different constraints are applied to extract the main plane information from the plane information for the walls and floors,while maintaining millisecond-level speed.(3)A point cloud clustering algorithm based on multi-view simulation is proposed.After the main plane detection algorithm is applied,the main plane information in the current scene is removed to obtain the point cloud objects.The projection image and corresponding depth map of the point cloud objects in the current scene are rendered from multiple virtual camera views,and the projection image is used to simulate the point cloud information.Superpixel segmentation and clustering are performed on the images of each viewpoint,and their results are fused to obtain the clustering results of the point cloud objects through coordinate transformation.This algorithm can greatly reduce the calculation cost of point cloud clustering operations through equivalent alternative solutions.On the basis of consistent accuracy with point cloud Euclidean clustering,it achieves millisecondlevel operating efficiency,making the algorithm able to be carried out on real-time reconstruction processes,and combined with the main plane detection algorithm to achieve the geometric segmentation scheme for the scene in this work and improve the accuracy of the final semantic segmentation result. |