Towards 3D Reconstruction And Semantic Mapping For Indoor Scenes

Posted on:2017-02-11

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Z Zhao

Full Text:PDF

GTID:1108330491960000

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Scene perception and understanding is a long-term goal of computer vision, ar-tificial intelligence and intelligent robotics. Recently, with the development of depth sensors(e.g. Kinect), indoor scene understanding gets a lot of attention. This article focuses on indoor scene reconstruction and understanding, including RGB-D recon-struction, RGB-D semantic segmentation and semantic mapping. Details as follows:First, we use RGB-D images to register and reconstruct the global scene. In order to build the large scale 3D map, image alignment techniques are used to register images, loop detection and pose optimization are adopted to eliminate error between images. At last, we represent the 3D scene as point cloud or 3D Mesh.Second, we propose a novel RGB-D semantic segmentation approach. This goal is to assign every pixel of the image with semantic labels. In our work, we assign each pixel with two kinds of semantic labels:object category and structural class. Structural class gives us a clear view of the structure for the indoor scene. On the other hand, object category achieves a more detailed view of the indoor scene. A conditional random field (CRF) model is used to infer semantic segmentations for RGB-D images.Third, through the combination of RGB-D reconstruction and RGB-D semantic segmentation, we can build semantic maps for indoor scenes. However, the semantic segmentation of each RGB-D image is not temporal consistency. We get the tempo-ral information by computing the correspondences of the superpixels, and model the temporal information through higher order potentials. By using the higher order CRF model, we get a temporal consistent semantic map.Fourth, manually segmenting and generating semantic labels for RGB-D image sequence or global point cloud will cost a lot of human labors. This article proposes a method to generate labels for each image with minimal manual effort. We firstly choose some frames to be labeled by human and then propagate these labels to the remaining images by CRF model. In this way, we can get semantic labels for training images with minimal human effort.The main contribution of this article is:First, we propose a variety of methods to ensure the robustness, real-time, scale and autonomy of the RGB-D reconstruction system. In terms of robustness, we com-bine feature-based and ICP-based methods to register images. In terms of real time, almost all the algorithms are implemented by GPU, including corner and ORB feature detection, feature matching, RANSAC, ICP, Marching cubes and so on. In terms of scale, we use loop detection and pose optimization algorithms to eliminate errors. In terms of autonomy, we adopt our robot KeJia and use its navigation system to build the global map automatically.Second, we adopt context information to infer semantic labels. Our CRF model integrates appearance, geometry, scene information, scene-object relationship, object-object relationship, structure-object relationship, spatial relationship to jointly infer scene type, object category and structural class. Through joint inference, we can get a spatial consistent semantic segmentation.Third, we propose a method to build temporal consistent semantic maps for indoor scenes. By computing the temporal information, modeling higher-order potentials, ex-tending the inference of the CRF model, we improve the accuracy of the semantic map and guarantee its temporal consistency.Fourth, we propose a method to automatically generate ground truth for RGB-D images and point cloud. We use a greedy algorithm to choose some images to be labeled by human and use a higher-order Dense CRF model to propagate labels to remaining point cloud and images. Results show that our methods can reduce human efforts for labeling effectively. For a scene which contains 1831 images, only 22 labeled images can achieve 93% accuracy for label propagation.

Keywords/Search Tags:

RGB-D Reconstruction, RGB-D Semantic Segmentation, Semantic Map- ping, Indoor Scene Understanding, Service Robotics

PDF Full Text Request

Related items

1	Research On Semantic Understanding Of Indoor Point Cloud Scenes Based On Deep Learning
2	Research On Scene Understanding Technology Of Indoor Service Robot Based On Deep Convolution Neural Networks
3	Research On Deep Learning Network For Semantic Segmentation Of Indoor Scene Point Cloud Based On Directional Plane Projection Convolution
4	Semantic-based Scene Understanding
5	Research On Scene Understanding Methods Based On Probabilistic Graphical Models
6	Data-driven Indoor Scene 3D Reconstruction And Semantic Understanding
7	Intelligence Perception And Understanding Of Environment Based On Machine Vision
8	Research On Key Techniques Of Video Semantic Understanding Based On Dynamic Scene Understanding
9	Indoor Scene Semantic Segmentation Based On Color-Depth Image Information
10	Research On Visual Perception And Reconstruction Of Regular Object In Indoor Scene For Robots