Font Size: a A A

3D Scene And Object Reconstruction From Multiple Sources And Viewpoints

Posted on:2022-07-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:H Z XieFull Text:PDF
GTID:1488306569487224Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
To endow machines with the ability to perceive the real-world in 3D representation as we do as humans is a fundamental and longstanding topic in artificial intelligence.Motivated by human cognition,the dissertation conducts the research on reconstructing 3D scenes and objects from multiple sources and viewpoints.By leveraging prior shape knowledge,the proposed method reconstructs the 3D shape of an object from a single RGB or depth image.As the number of input images increases,the reconstruction results are incrementally refined.There are many applications for 3D reconstruction,including computer-aided design,mixed reality,robotics,and autonomous driving.Based on the analysis of the present situation of research,the existing 3D reconstruction methods mainly suffers from three challenges.First,they typically require scanning all surfaces of an object before reconstruction,which is not always feasible in practice.Second,they only reconstruct the 3D structure from color or depth images,which can not make full use of data from different modalities and viewpoints.However,the feature matching of RGB images fails on weak or repeated texture objects.Moreover,the depth information can not be obtained from the objects without reflection.Third,they are semantic-free and thus the reconstructed objects and the background are mixed together,which causes difficulties to separate the objects from the reconstructed scene.To solve the three problems,the dissertation studies the corresponding problems from three levels: single-view 3D object reconstruction,multi-view 3D object reconstruction,and multi-view 3D scene reconstruction.Specifically,the main content and contributions are summarized as the following three aspects.First,three geometry-structure-aware single-view 3D object reconstruction methods for monocular RGB cameras,stereo RGB cameras,and depth cameras are proposed to solve the problem that existing 3D reconstruction methods cannot restore the invisible part of the 3D structure of the object.The proposed methods recover the 3D shape of the invisible parts of objects by leveraging known colors,structure,and priors.For monocular RGB cameras,the geometry prior network is proposed to learn geometric priors from large-scale 3D datasets and implicitly establish the mapping relationship between image space and 3D model space.For stereo RGB cameras,the depth-aware network is proposed for 3D object reconstruction.The proposed method estimates the depth map by leveraging the constraints of the two views,which better preserves the detailed 3D structure of objects when reconstructing the complete 3D shape of an object.For depth cameras,the gridding residual network is proposed for 3D object reconstruction.The proposed method takes the 3D grid as an intermediate representation of geometry structure,so that the context information is fully utilized in the calculation.Moreover,the geometry structures captured by the depth camera are better preserved.Experimental results on the Shape Net,Pix3 D,and KITTI datasets indicate that the three proposed methods recover the complete 3D shape of an object from a single-view image,which outperforms the existing 3D reconstruction methods with 3% to 18% performance improvement.Second,the multi-source and multi-view 3D object reconstruction method based on the multi-scale context-aware fusion is proposed to solve the problem that existing methods cannot make full use of data from different modalities and viewpoints.On the one hand,the robustness of different data modalities to the objects with different materials are different.For example,it is difficult to recover the 3D structure of weak-or repeatedtexture objects from multi-view RGB images.So do the depth cameras for the objects without reflection.On the other hand,different visible parts of an object from different viewpoints.The reconstruction qualities of the visible parts are much higher than those of invisible parts.Inspired by this observation,the multi-scale context-aware fusion module is proposed to adaptively select high-quality reconstruction for each part from different 3D shapes generated from different viewpoints or cameras.The selected reconstructions are fused to generate a 3D shape of the whole object.Experimental results on the Shape Net,Pix3 D and Thing 3D datasets indicate that the proposed method outperforms the stateof-the-art method with 4% to 20% performance improvement.Moreover,the proposed method is at most 7 times faster than the existing methods.Finally,the semantic-aware multi-view 3D scene reconstruction method is proposed,which makes separating the 3D objects from the reconstructed scene easier.The proposed method reconstructs the 3D scene and the complete shape of 3D object simultaneously by modeling the semantics of the scene.To perceive the semantics of the scene,the regional memory network is proposed for video object segmentation,which separates the objects from the image sequence and better distinguishes the objects with similar appearances.Furthermore,the semantic-modeling-based multi-view 3D scene reconstruction is proposed to reconstruct the scene and the inside objects by recovering the complete 3D shape of each object and estimating the position and pose of the object.Experimental results on the SUN3 D dataset and on-site videos indicate that the proposed method achieves better reconstruction results for 3D scenes and objects compared to existing methods.Through the above studies,the dissertation deeply explores the 3D scene and object reconstruction,providing feasible and effective solutions toward the key technical issues for real-world scenes.Starting from single-view single-object reconstruction,the dissertation further presents methods for multi-view single-object reconstruction and multi-view multi-object reconstruction.In view of the three problems in the existing 3D reconstruction methods,the proposed method reconstructs 3D scenes while perceiving the semantics,which recovers more complete shapes of objects and makes the reconstructed objects easier to be separated from the reconstructed scenes.The proposed method are more robust to recover the 3D shape to the weak-or repeated-texture objects and the objects without reflection.
Keywords/Search Tags:3D reconstruction, 3D object reconstruction, scene semantic aware, multi-scale context aware, geometry structure aware
PDF Full Text Request
Related items