Monocular depth estimation is a fundamental task in computer vision and artificial intelligence research,aiming to predict the accurate depth from image sequences by the network training.High quality depth information can be applied to various fields,such as autonomous navigation of robots,3D reconstruction and augmented reality.Existing research works use the encoder-decoder network to extract feature information of images,and then train the network based on the photometric loss of view synthesis,achieving self-supervised monocular depth estimation.However,these approaches lack more detailed perception of scene information and are vulnerable to environmental factors,limiting the performance of the network.In order to obtain more accurate depth estimation results of monocular image sequences,depth estimation methods based on parallel feature enhancement and semantic guidance are proposed.Meanwhile,the thesis trains a lightweight monocular depth estimation network based on knowledge distillation and applies it to the 3D reconstruction task.The main contributions of the thesis are as follows:(1)Existing self-supervised depth estimation networks lack the effective feature extraction mechanism,resulting in the loss of feature details during the network transmission.Aiming at this problem,the thesis designs a monocular depth estimation network based on parallel feature enhancement.The network achieves multi-stage feature enhancement and parallel feature fusion based on coordinate attention,which extracts feature information from images thoroughly.At the same time,the multi-scale training loss is optimized to achieve more accurate results of depth estimation.(2)Light changes and non-textured areas in outdoor scenes can affect the distribution of photometric loss,making the depth of object boundaries inaccurate.In response to this problem,the thesis proposes a depth estimation method that combines implicit and explicit semantic guidance.The method uses an implicit cross-task feature interactive fusion mechanism to enhance the complementarity of feature information between tasks.What’s more,the metric loss is designed based on the contrasting relationship of depth features on explicit semantic boundaries,which optimizes the depth of object boundaries of the scene and further improves the accuracy of depth estimation.(3)In order to improve the efficiency of the depth estimation,the thesis designs a lightweight monocular depth estimation network based on knowledge distillation.The training loss of distillation learning is constructed with depth uncertainty,which reduces the influence of the noise of pseudo labels and improves the accuracy of the network.In the thesis,a 3D reconstruction framework is designed by combing a lightweight monocular depth estimation network.The framework uses the visual odometry method for localization,and then constructs the map of 3D environment based on surfel fusion,which achieves the 3D visualization of depth maps and enhances the application of the monocular depth estimation method.(4)The thesis adopts KITTI and other datasets to train and evaluate the above-mentioned networks.The experimental results show that the monocular depth estimation networks achieve improvement in accuracy or efficiency.In addition,experiments on 3D environment reconstruction based on the lightweight monocular depth estimation network are carried out to prove the feasibility of the method. |