Font Size: a A A

Accurate 3D Reconstruction For Complex Scenes Based On Multi-View Images

Posted on:2022-08-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:J LiaoFull Text:PDF
GTID:1488306497989849Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the popularity of portable camera devices like smartphones,digital cameras,and motion cameras,camera shooting has been more and more convenient,and the number of images stored in the terminals and the cloud has seen explosive growth.At the same time,the development of virtual reality,augmented reality,3D map,and other emerging technologies boost the demand for the quantity and quality of 3D models.Given calibrated multi-view images,Multi-View Stereo(MVS)can recover the dense3 D presentation of the captured scenes,which meets the demand for 3D models based on the easily acquired image data.MVS requires little for the hardware equipment and the workers and is suitable for various scenes,which makes MVS of high practical value and widely applied in the fields like digitization of cultural relics,3D map construction,and drone-based detection.Although the development of computer techniques and the theory of multi-view geometry has boosted the progress in MVS in recent years,there remain some critical problems when applying MVS to the various complex scenes in our daily life.First,traditional MVS methods utilize the patch model to approximate the scene surfaces.As there are lots of complex objects in daily life like the flowers and trees which are hard to be fitted by the patch model,the stereo matching and reconstruction results are poor in such regions.Second,traditional MVS methods perform geometry estimation by optimizing the patch model to maximize the photometric consistency between the projections of the patch model on multiple views.As the photometric consistency can not distinguish the patches in textureless regions,the reconstruction in such regions is usually poor.Last,although the emerging learning-based MVS network is more robust in the specular and mirror regions which are intractable for the traditional MVS,its high memory consumption limits its output resolution and precision,making the applications of the MVS network in our daily life rather limited.Focusing on accurate 3D reconstruction for the complex scenes based on multi-view images,this thesis addresses the practical problems for applying MVS in our daily life.First,this thesis works on the stereo matching on complex scene surfaces and proposes the folding patch model which can fit the complex surfaces better.In this way,the problem of poor reconstruction for the complex scene surfaces like the trees and flowers is solved.Then,this thesis studies the geometry estimation for the untextured regions.This thesis proposes to constrain the geometry estimation between neighboring pixels based on the local consistency and solves the problem of 3D reconstruction for the untextured regions.Finally,this thesis addresses the scalability of the MVS network.The thesis proposes to utilize adaptive depth estimation in the cross pyramid architecture to avoid unnecessary computation on the insignificant regions,which dramatically reduces the memory consumption of the MVS network and improves the scalability for3 D reconstruction.The main research contents and innovation points of this thesis are summarized as follows:1.A folding patch model and the corresponding stereo matching strategy are proposed,which can significantly improve the reconstruction results,especially on complex scene surfaces.The folding patch model is constructed by folding the traditional patch model in the middle line,which can better approximate the round and folded surfaces.This thesis proposes the modification strategy for the initialization,propagation,and optimization procedure in the patch-based MVS to integrate the folding patch model.By integrating the folding patch model,the reconstruction for complex scene surfaces is improved.2.A pyramid MVS method based on the local consistency is proposed,which can efficiently reconstruct the textureless regions.Traditional MVS methods perform geometry estimation by optimizing the patch model to maximize the photometric consistency between the projections of the patch model on multiple views.As the photometric consistency can not distinguish the patches in textureless regions,the reconstruction in such regions is usually poor.First,this thesis modifies the photometric consistency metric to make it suitable for patchmatch in the untextured regions.Then,this thesis proposes the local consistency which encourages the depth and normal hypotheses of the neighboring pixels with similar colors to be consistent.Last,this thesis introduces the pyramid architecture,which can boost the convergence of the local consistency and provide multi-scale information for more robust geometry estimation.3.A pyramid MVS network based on adaptive depth estimation is proposed,which can perform efficient high-resolution depth estimation with low memory consumption.By introducing the pyramid architecture,the proposed MVS network gradually refine the depth map to the desired resolution in a coarse-to-fine fashion.Different from previous methods that estimate depth values for all pixels at each level of the pyramid,the proposed MVS network only refines depth values at a small set of locations where the previous predictions are likely to be incorrect,which dramatically avoids computation on unnecessary regions.To perform depth estimation for sparsely selected locations,this thesis proposes the lightweight pixelwise depth estimation module,which can estimate depth for each location independently.The three research contents of this thesis all focus on the application of MVS in our daily life,solving the problems of reconstruction for complex scene surfaces and untextured regions for the traditional MVS and the problem of vast GPU memory consumption for the learning-based MVS.The experiments on the various datasets validate the effectiveness of the proposed methods,which can be applied to the fields like 3D map construction and cultural relic digitization.The proposed method is of high practical value,and the corresponding theory is meaningful for the related fields in computer graphics and computer vision.
Keywords/Search Tags:3d reconstruction, multi-view geometry, stereo matching, multi-view stereo
PDF Full Text Request
Related items