Font Size: a A A

RGBD Based Semantic Scene Completion For 3D Environments

Posted on:2021-09-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:J LiFull Text:PDF
GTID:1488306755460404Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Three-dimensional(3D)scene understanding is an important research content in the field of computer vision and robotics.It empowers intelligent robots to perceive and analyze the3 D world.Semantic scene completion(SSC),which consists of semantic segmentation and shape completion of 3D scenes,is an emerging and popular research topic of indoor 3D scene understanding.The purpose of scene completion is to break through the limitations of object occlusion and perspective and to obtain information such as object category,position,and shape from a single-view image.Semantic segmentation aims to achieve a high-level semantic understanding of the indoor environment through category labelling of all objects in the scene.Due to the significant variations in size,shape,and layout of the diverse objects,and changes in the visibility of objects in the scene caused by the limitations of image perspective and occlusion,the task of semantic scene completion is quite challenging.This thesis applies the technology of deep convolutional neural networks to process RGBD(RGB image and Depth image,RGBD)data and focuses on developing efficient and accurate networks.It has proposed several solutions to address some of the urgent problems in the semantic scene completion task.The main contributions of this thesis can be summarized in five folds:1.This thesis proposes a hybrid model that combines both 2D and 3D convolutional neural networks for semantic scene completion.This network takes advantage of TSDF-encoded 3D grids that are invariant to the viewpoint projection.Also,it adopts a 2D convolution network to extract information from the raw depth to reduce the loss of fine-grained information during voxelization,effectively.The proposed 2D-3D feature projection algorithm combines the 2D neural networks and the 3D neural networks,to unify the hybrid network into the 3D space to predict the dense volumetric semantic scene.This framework with a hybrid structure plays a vital role in the subsequent studies in this thesis.2.Propose position-importance aware loss function(PA-Loss)to improve network ability to perceive voxels at essential positions during the training process.Existing methods have not considered the difference in importance of voxels at different locations in the scene,leading to insufficient attention to the voxels on surfaces,edges,and corners,which contain more geometric information than internal voxels of the objects.This paper establishes local geometric anisotropy to measure the importance of voxels at different positions and constructs PA-Loss.During training,the rare voxels located on the surface or corner of the object with higher importance measurement contribute more than voxels inside the object with redundant information.Therefore,the proposed loss function is beneficial to recover critical details such as object surface and scene corners.3.Propose a dimensional decomposition residual block(DDR)to replace the conventional 3D convolution.The proposed DDR module dramatically reduces the model parameters and computational costs without performance degradation.The carefully designed structure converts the growth rate from cubic to linear.A lightweight network for SSC is constructed based on DDR.Features of the color image and depth map are fused in multi-scale seamlessly,which enhances the network representation ability and boosts the performance of shape completion and semantic segmentation.4.Gated recurrent fusion(GRF)module for fusing features of color image and depth map is proposed.GRF can conduct valid selection and fusion between two modalities through employing “gate” and “memory” components.To the best of our knowledge,this is the first time that gated recurrent network is employed for data fusion in the SSC task.Besides,an end-to-end 3D-GRF based network,GRFNet,is presented for fusing RGB and depth information in the SSC task.Within the framework of GRFNet,both single-stage and multi-stage integration strategies are proposed.5.This thesis presents a novel anisotropic convolution(AIC)that breaks up the limitation of fixed convolution kernel size.AIC module adapts to the dimensional anisotropy property voxel-wisely and thus implicitly enables 3D kernels with varying sizes.The modulation parameters in AIC models are learned to adjust the convolution kernel scale for different voxels.The new module is much less computational demanding with higher parameter efficiency,comparing to the standard 3D convolution units.It can be used as a plug-and-play module to replace the standard 3D convolution unit.By stacking multiple such anisotropic convolution modules,the voxel-wise modeling capability can be further enhanced while maintaining a controllable amount of model parameters.Extensive experiments and evaluations on the benchmarks demonstrate the superiority of the proposed methods,which achieve state-of-the-art performance.
Keywords/Search Tags:3D Scene Understanding, Semantic Scene Completion, Shape Completion, Semantic Segmentation, RGBD Data, Convolutional Neural Network
PDF Full Text Request
Related items