With the rapid development of computer vision and deep learning technologies,3D reconstruction has become a key technology in various application fields,such as virtual reality,architecture and urban planning,autonomous driving,and medical imaging.In these fields,high-quality 3D reconstruction is crucial for achieving more accurate scene understanding and interaction.However,active 3D reconstruction methods suffer from high cost,increased system complexity,limited operational range,security and privacy issues,and insufficient real-time performance,while traditional3 D reconstruction methods are often limited by high computational complexity,low reconstruction accuracy,and sensitivity to scene and lighting conditions.Multi-view image-based 3D reconstruction,which only requires common cameras or smartphones to conveniently capture input images,thus has great development potential and research value.This study aims to explore the key technologies of multi-view 3D reconstruction based on deep learning and propose an effective 3D reconstruction system consisting of sparse reconstruction and dense reconstruction.The main contents of this paper are as follows:(1)In the sparse reconstruction phase,considering the issues arising from the feature points and their descriptors extracted by traditional algorithms in 3D reconstruction tasks,a deep learning method is chosen to learn richer feature representations.Addressing the problem that existing feature detection methods based on deep learning do not have good rotational invariance,a new deep learning-based feature point detection framework based on the R2D2 network is proposed.This framework takes into account the fundamental issue that convolution is not fully rotationally invariant.It sends two or more images with different rotation angles into the R2D2 network for feature point detection and descriptor computation,generating multiple rotated versions of descriptors.By integrating these descriptors,better and richer feature point descriptions are obtained.Evaluation results on the Hpatches dataset show that the feature detection method in this paper increases the average number of correctly matched point pairs by 23.5% compared to the original CNN framework R2D2.Although the average correct matching rate has slightly declined,the root mean square error of matching has decreased by 27.8%.Compared with other improved methods,the feature detection framework in this paper has stronger robustness and adaptability,improves matching accuracy,and achieves better detection results in low-texture or complex texture areas.Subsequently,the multi-view image features detected based on the new framework are applied to subsequent feature matching and incremental SfM,forming a complete sparse reconstruction process.Sparse reconstruction experiments on the DTU dataset show that the sparse reconstruction method based on deep learning feature detection proposed in this paper,compared to the COLMAP based on traditional feature detection methods,reconstructs an additional 4453.87 three-dimensional points with only a 0.24 GB increase in memory usage,reduces reprojection error by 34.73%,and also reduces the running time by12.89 s.This method achieves better sparse reconstruction results in a shorter time.(2)In the dense reconstruction phase,addressing the current problems of low accuracy,poor completeness,and difficulty in utilizing high-resolution images in multiview 3D reconstruction,an Iterative Attention-based Multi-View Stereo Network(IAMVSNet)is proposed.This approach deeply exploits the high-level semantic information of images to enhance the precision and robustness of the reconstruction algorithm’s stereo matching.Firstly,the Attention Gate module is integrated into the feature extraction network,which improves the network’s feature extraction capabilities by capturing global information to gain a larger receptive field and context information.Secondly,an iterative cost-volume regularization module based on the Convolutional Block Attention Module(CBAM)is designed to integrate multi-scale information.The regularized small-scale cost volume is used for subsequent depth regression,and it can also be fused with the lower-level cost volume.This achieves a stepwise depth prediction from low resolution to high resolution,significantly improving the accuracy and completeness of the reconstruction.Finally,in the testing phase,a new inherited consistency filtering method is proposed.This method reduces the impact of erroneous information from the initial depth hypothesis plane on the cascade network’s depth estimation,further enhancing the reconstruction results.Experimental results on the DTU dataset show that the point cloud reconstruction completeness and overall indices of the proposed method have improved by 15.3% and 2.5% respectively compared to the classic CasMVSNet algorithm.In comparison with other improved algorithms,the proposed method also shows significant improvements.(3)Based on the sparse reconstruction method using deep learning feature detection proposed in this paper and the Iterative Attention-based Multi-View Stereo Network,referencing the structure of the open-source software COLMAP,a small evaluation software for multi-view 3D reconstruction based on deep learning has been designed and implemented.This software has functions such as feature detection and matching,sparse reconstruction,and dense reconstruction.Additionally,it conveniently allows the visualization and exportation of the reconstruction results,making it suitable for use in fields such as architecture and urban planning,medical image analysis,archaeology,and paleobiology. |