3D reconstruction technology has been widely used in medical,virtual reality,auto-driving and other fields in recent years.Multi-view 3D reconstruction takes a set of images and their corresponding camera parameters as input,matches images with different perspectives and overlaps,and calculates depth information to generate a complete 3D model.At present,many high-quality algorithms have been developed for multi-view 3D reconstruction,and the technical level is rapidly improving.With the successful application of convolution neural network in various computer vision tasks,the deep learning-based3 D reconstruction algorithm has become a hot spot in the multi-view 3D reconstruction technology.In this paper,the research status of multi-view 3D reconstruction at home and abroad is analyzed.Current methods of multi-view 3D reconstruction still face many problems to be solved:(1)The existing multi-view depth estimation network is difficult to obtain the relationship between global and local features in the feature extraction stage;(2)The existing multi-view based 3D reconstruction methods do not fully consider the visibility of pixel points in the rest of the view,which results in insufficient reconstruction integrity and difficulties in reconstructing weak textures and obscured areas.In addition,the resulting depth map may have missing details and uneven edges.Then,this paper focuses on the above problems in the process of multi-view 3D reconstruction,the main research contents are as follows:(1)For the existing multi-view depth estimation network is difficult to obtain the relationship between global and local features in the feature extraction stage,this paper proposes a cascaded 3D reconstruction network based on the self-attention mechanism,which is connected to the self-attention layer after the convolution layer of the feature extraction network decoder,to obtain the global context information while capturing the more relevant key information.At the same time,a depth map fusion algorithm is proposed.Dense point clouds are obtained by checking the consistency of the reprojection errors between the pixel points of all views and the 3D points.The experimental results demonstrate the validity of the proposed method.The overall accuracy of this method is0.325 mm on DTU datasets,and the model has good generalization performance.The F-score of this method is 55.34 on Tanks and Temples datasets,and the point cloud reconstructed with this method is more complete and detailed in real scenes,adapting to different datasets and scenes.(2)An adaptive cost aggregation method based on visibility perception is proposed for cost aggregation to obtain the visibility of pixel points in views over a network,which does not take into account the visibility of pixel points in other views sufficiently,resulting in insufficient reconstruction integrity and difficulties in reconstruction of weak textures and obscured areas.It can improve the integrity of occlusion area reconstruction.In view of the possible missing details and uneven edges of the generated depth map,this paper presents a depth map optimization module based on convolution spatial propagation network,which uses learning affinity to guide depth map thinning to obtain the optimized depth map.The comparison results show that this method is not inferior to UCSNet in restoring details in public dataset scenes and real-world scenes,achieves good integrity in weak texture detail areas,achieves an overall accuracy of 0.321 mm on DTU datasets,has a F-score of 55.15 on Tanks and Temples datasets,and improves memory consumption and running speed significantly.It only takes 0.41 s to reconstruct a high-quality depth map of 1600×1184,and the memory footprint is only 4.5GB. |