Image-based three-dimensional reconstruction is a kind of research that obtains 3D information from images and reconstructs the scene model.It has been an important research category in the direction of computer vision for a long time.With the development of computer vision and the improvement of hardware performance,3D models are playing an increasingly important role in mapping,education,medicine,movies and entertainment.Moreover,the research value of scene semantic reconstruction in the fields of AR(augmented reality)and driverlessness has also become increasingly prominent.At present,based on Marr's vision theory framework,many 3D reconstruction methods based on different mechanisms,different equipment and different assumptions have been formed.Monocular camera is cheap,flexible and widely used in real life.Therefore,it is more valuable to study the monocular vision 3D reconstruction technology.Traditional image-based reconstruction algorithms can hardly work well in the absence of textures,complex geometric conditions,and monotonic structures.At present,with the rapid development of deep learning,it becomes possible to apply deep learning-based methods to 3D reconstruction.In addition,deep learning has also made great progress in semantic segmentation.Fusion of semantic information onto three-dimensional models to form semantic three-dimensional models has also become an important research direction.This paper focuses on three aspects of feature point extraction,dense image depth estimation and image semantic segmentation combined with depth maps.1.Interest point detection is the basis of 3D reconstruction system based on interest points.The density and accuracy of interest points affect the accuracy of motion structure recovery.This paper studies a self-supervised interest point detection algorithm based on deep learning.It uses an end-to-end method to obtain interest point positions and descriptors simultaneously.While ensuring the generation of enough interest points for pose estimation and optimization,the repeatability,stability and detection speed of interest points are also considered.2.To solve the problem that traditional monocular image depth estimation strongly depends on the sparseness of feature points,and pure deep learning estimationalgorithms have large data requirements,poor interpretation and low accuracy,this paper studies a dense depth estimation algorithm that fuses geometric information.The model improves the cost volume construction method and the cost volume regularization network.Obtain finer depth estimates while consuming less hardware resources.In this way,the obtained depth map can be used to perform dense 3D reconstruction based on the depth fusion method.3.In order to improve the accuracy of semantic segmentation of deep neural networks,this paper studies the design ideas and specific methods of several mainstream deep neural networks for semantic segmentation.Based on the DeepLabv3+network structure,a depth map network branch is added.Multi-level information fusion and multi-scale information fusion of RGB network and depth map network are used to obtain more accurate semantic segmentation results.Finally,this paper builds an experimental platform for semantic 3D reconstruction based on the results of the above main research content.Experiment with the open source datasets on the platform to study the effectiveness and performance of this experimental platform.In addition,this paper uses images taken from five scenes to perform semantic 3D reconstruction on the experimental platform,which proves the usability of the experimental platform in actual scenes. |