Vision-Based Layout Estimation In Indoor Scenes

Posted on:2020-10-10

Degree:Doctor

Type:Dissertation

Country:China

Candidate:W D Zhang

Full Text:PDF

GTID:1368330602456088

Subject:Biomedical engineering

Abstract/Summary:

PDF Full Text Request

The problem of indoor layout estimation is to predict the spatial layout of indoor scenes based on the 2D images,which is valuable for a wide range of tasks such as scene reconstruction,indoor navigation and augmented reality,and therefore has great academic significance.The task of indoor layout estimation has attracted many researchers in recent years and obtained great progress in algorithm and accuracy.However,many problems still remain unsolved in practical applicaitons.Firstly,the predicted feature maps are usually inaccurate and blurry.The usage of edge and semantic information is insufficient as well.Secondly,most of the exsiting layout estimation methods are time consuming,and the practical applications are thus restricted.Thirdly,existing research on layout estimation base on 2D representation of layout,from which the obtained spatial information are limited.The above problems have negative effects on layout estimation in accuracy,runtime and practical value.In this paper we study the above problems and present the solutions respectively.The main contents and contributions are summarized as follows:1.To deal with the problem that the predicted feature maps are of low quality,as well as that the edge information and semantic information are under-utilized,we propose to jointly learn the edge information and semantic information and design a deep network with encoder-decoder structure,which can produce accurate and clear prediction of feature maps.The novelty and contribution can be summarized as:(1)We present an encoder-decoder network which can obtain high-quality prediction of feature maps.The encoder is capable of aggregating the global context and the decoder can produce accurate and clear prediction of feature maps in a coarse-to-fine manner.(2)We present a joint learning method to learn the edge information and semantic information.Two separate decoders are utilized to predict the edge map and sematic labels respectively,from which accurate and relatively independent prediction of edge and semantic estimates can be obtained.(3)Based on the edge estimates and geometry constraints,we propose a modified adaptive ray sampling method for generating the layout hypotheses.Besides,to exploit the similarity of interior spatial organization,we propose a method for generating layout proposals by searching in a predefined pool.(4)A pixel-level refinement algorithm is proposed to further reduce the error rate.The neighboring pixels around each keypoint are searched and the current layout is iteratively updated with a better layout hypothesis.The accuracy of layout estimation is significantly increased by the refinement algorithm.2.To solve the problem of slow running speed in layout estimation,an end-to-end framework is proposed to directly predict the estimated layout type and keypoint coordinates of an input image.The novelty and contribution can be summarized as:(1)We propose an end-to-end learning framework for layout estimation that is both effective and fast.The method takes the edge map to bridge two mappings,and the problem is divided in two:to predict the edge map from input image and to predict the type and keypoint coordinates from the edge map.(2)We solve the problem that the original training data is insufficient and imbalance between different types.First,we generate sufficient and balanced layout samples via sampling rays from the vanishing points.Then generative adversarial network is introduced to modify the artificial layout samples in order to have similar style with the predicted edge maps.At last the network for predicting the layout types and keypoint coordinates can be trained with the artificial layout samples after modification.Therefore,the network is able to directly output the parameterized results of layout estimation and the speed is significantly improved.3.We present the topic of estimating the 3D layout of the input image to overcome the limitation of the current 2D layout estimation task.We annotate and generate the first dataset for 3D layout estimation and propose a specific algorithm for the task.The novelty and contribution can be summarized as:(1)We first present the task of 3D layout estimation,where the layout is represented by the depth of the indoor planes such as floor and walls.Using camera parameters,the depth map can be converted to 3D point cloud which can represent the 3D structure of indoor layout.(2)Based on the projection model of camera,we deduce the general equation of a plane in a depth map,i.e.,the reciprocal of depth value is linear w.r.t.the coordinates.(3)Based on the equation and existing RGB-D dataset in indoor scenes,we annotate and generate the first 3D layout estimation dataset.(4)We propose the learning strategy for 3D layout estimation.The target depth map is decomposed into multiple parametric maps,which are composed of local planes.The scale factor is further separated from the parametric maps.Therefore,the task of depth estimation which is non-linear is converted to the task of plane estimation.The proposed method greatly reduces the learning curve and is beneficial for layout estimation.

Keywords/Search Tags:

Indoor layout estimation, Scene understanding, Encoder-decoder network, Generative adversarial network, Depth estimation

PDF Full Text Request

Related items

1	Layout Estimation Of Indoor Scenes Based On Conditional Generation Adversarial Networks
2	Spatial Layout Estimation Of Indoor Scene Using Informative Edges And Multi-modality Features
3	Indoor Scene Understanding Based On Convolutional Neural Network And 3D Geometric Context Information
4	A Coarse-to-fine Estimation Of Spatial Layout Of Indoor Scenes
5	A Study On Single Image Depth Estimation And Sparse Depth Completion Based On Adversarial Learning
6	Deep Learning Based Monocular Scene Depth Estimation Algorithm
7	Research On 3D Indoor Scene Technology For Video Sequences
8	RSS Missing Value Estimation With Adaptive Context Generative Adversarial Networks Model
9	Unsupervised Generative Adversarial Learning For 3D Scene Flow From Stereo Images
10	Deep Neural Network Based Depth Recovery From Multi-Modality Input