Font Size: a A A

Learning-Based Multi-View Stereo

Posted on:2022-05-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:K Y LuoFull Text:PDF
GTID:1488306572973809Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Multi-View Stereo(MVS)is an important and basic research topic in machine vision,which owns great economic value and is widely used in many fields,such as surveying,engineering construction,and autonomous driving.With the rapid development of society,the demand for the performance of MVS in various fields is also increasing,and it is difficult for existing MVS methods to effectively reconstruct the hard regions of the scene,such as weak-textured,specular and reflective regions.Therefore,how to improve the performance of MVS is an imperative problem.In recent years,machine learning has performed great in many vision tasks.And using machine learning to improve the performance of MVS has become one of the research hotspots in machine vision.However,how to effectively combine machine learning with MVS is still an open and challenging problem.To this end,this thesis regards the depthmap as the main latent representation of the 3D model and proposes a complete learningbased MVS system,which mainly consists of camera refinement,depth-map estimation,depth-map completion,and depth-map filtering.The main works and innovations are concluded as follows:1)For camera refinement,this thesis proposes a camera refinement method based on the partial bundle adjustment.First,the global visibility graph is divided from coarse to fine to obtain a series of well-configured local visibility subgraphs.Then,for each subgraph,the core cameras that can be used to reconstruct the corresponding local scene are picked,and refined by partial bundle adjustment to acquire more accurate parameters.The experiments show that the proposed method can effectively improve the accuracy of the camera and significantly improve the quality of the reconstructed model.2)For depth-map estimation,this thesis proposes the multi-view depth prediction network,P-MVSNet,based on the patch-wise matching confidence learning.First of all,PMVSNet calculates the pixel-wise matching confidence volume based on the mean-square error and uses the patch-wise matching confidence learning module to improve the matching accuracy.Then,it regularizes the matching confidence volume via the hybrid 3D U-Net and obtains the higher-resolution depth-map via the depth upsampling module.Besides,to improve the quality of the predicted depth-map,this thesis proposes the depth confidence criteria based on the depth probability distribution volume produced by P-MVSNet,and develops the depth-consistency criteria based on the depth-consistency first strategy and multiview geometric consistency.The experiments show that the proposed P-MVSNet can not only reconstruct the weak-textured,reflective,and other hard regions well,but also achieve the best overall performance on the DTU and the Tanks & Temples benchmarks.3)For depth-map estimation,this thesis further proposes the attention-aware multiview stereo,Att MVS.Att MVS first constructs the attention-enhanced matching confidence volume based on the attention-aware pixel-consistency measure.Then,a bottom-up regularization strategy is designed to regularize the matching confidence volume.Finally,this thesis introduces a gradient loss function to enhance the learning ability of Att MVS.Furthermore,based on a dual-modal depth alignment strategy,this thesis proposes a depthmap filtering algorithm to improve the quality of ground truth data.The experiments show that the proposed Att MVS can not only further improve the quality of reconstruction of the weak-textured,reflective,and other hard regions,but also achieve the state-of-the-art performance on the DTU and the Tanks & Temples benchmarks.4)For depth-map completion,this thesis proposes a novel unsupervised single-view depth completion framework,USDC,and a dual-head depth prediction model,DHDepth.USDC only needs the sparse depth-map and the reference image to complete depth and does not suffer from the notorious domain shift problem.To complete the depth-map,DHDepth encodes the color and pixel position of the reference image to fully understand the deep semantic features and the spatial location features of the target scene.Furthermore,this thesis proposes a robust patch-based sparse depth-map resampling algorithm to reduce the computational resources consumed by DHDepth,thereby improving learning efficiency.The experiments show that the proposed USDC can not only achieve the state-of-the-art performance on single-view depth completion task,but also be seamlessly embed into the depth-map based MVS,and significantly improve the reconstruction quality of the weaktextured,reflective,and other hard regions.
Keywords/Search Tags:learning, multi-view stereo (MVS), camera refinement, depth-map estimation, depth-map completion
PDF Full Text Request
Related items