Font Size: a A A

Towards 3D Reconstruction And Semantic Mapping For Indoor Scenes

Posted on:2017-02-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z ZhaoFull Text:PDF
GTID:1108330491960000Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Scene perception and understanding is a long-term goal of computer vision, ar-tificial intelligence and intelligent robotics. Recently, with the development of depth sensors(e.g. Kinect), indoor scene understanding gets a lot of attention. This article focuses on indoor scene reconstruction and understanding, including RGB-D recon-struction, RGB-D semantic segmentation and semantic mapping. Details as follows:First, we use RGB-D images to register and reconstruct the global scene. In order to build the large scale 3D map, image alignment techniques are used to register images, loop detection and pose optimization are adopted to eliminate error between images. At last, we represent the 3D scene as point cloud or 3D Mesh.Second, we propose a novel RGB-D semantic segmentation approach. This goal is to assign every pixel of the image with semantic labels. In our work, we assign each pixel with two kinds of semantic labels:object category and structural class. Structural class gives us a clear view of the structure for the indoor scene. On the other hand, object category achieves a more detailed view of the indoor scene. A conditional random field (CRF) model is used to infer semantic segmentations for RGB-D images.Third, through the combination of RGB-D reconstruction and RGB-D semantic segmentation, we can build semantic maps for indoor scenes. However, the semantic segmentation of each RGB-D image is not temporal consistency. We get the tempo-ral information by computing the correspondences of the superpixels, and model the temporal information through higher order potentials. By using the higher order CRF model, we get a temporal consistent semantic map.Fourth, manually segmenting and generating semantic labels for RGB-D image sequence or global point cloud will cost a lot of human labors. This article proposes a method to generate labels for each image with minimal manual effort. We firstly choose some frames to be labeled by human and then propagate these labels to the remaining images by CRF model. In this way, we can get semantic labels for training images with minimal human effort.The main contribution of this article is:First, we propose a variety of methods to ensure the robustness, real-time, scale and autonomy of the RGB-D reconstruction system. In terms of robustness, we com-bine feature-based and ICP-based methods to register images. In terms of real time, almost all the algorithms are implemented by GPU, including corner and ORB feature detection, feature matching, RANSAC, ICP, Marching cubes and so on. In terms of scale, we use loop detection and pose optimization algorithms to eliminate errors. In terms of autonomy, we adopt our robot KeJia and use its navigation system to build the global map automatically.Second, we adopt context information to infer semantic labels. Our CRF model integrates appearance, geometry, scene information, scene-object relationship, object-object relationship, structure-object relationship, spatial relationship to jointly infer scene type, object category and structural class. Through joint inference, we can get a spatial consistent semantic segmentation.Third, we propose a method to build temporal consistent semantic maps for indoor scenes. By computing the temporal information, modeling higher-order potentials, ex-tending the inference of the CRF model, we improve the accuracy of the semantic map and guarantee its temporal consistency.Fourth, we propose a method to automatically generate ground truth for RGB-D images and point cloud. We use a greedy algorithm to choose some images to be labeled by human and use a higher-order Dense CRF model to propagate labels to remaining point cloud and images. Results show that our methods can reduce human efforts for labeling effectively. For a scene which contains 1831 images, only 22 labeled images can achieve 93% accuracy for label propagation.
Keywords/Search Tags:RGB-D Reconstruction, RGB-D Semantic Segmentation, Semantic Map- ping, Indoor Scene Understanding, Service Robotics
PDF Full Text Request
Related items