Depth estimation is one of the important tasks in computer vision and has wide applications in fields such as autonomous driving,3D reconstruction,and object detection.Monocular depth estimation has received widespread attention and research due to its advantages of easy image acquisition and better applicability.The existing monocular depth estimation is mainly divided into two types: supervised and self supervised,with the former relying on sparse and expensive depth truth values;The latter is based on the principle of image reconstruction and has serious boundary blur problems.Therefore,the semi supervised monocular depth estimation algorithm that combines the two has become a recent research hotspot.The existing semi supervised monocular depth estimation methods use two networks for supervised and self supervised training,and use the generated pseudo labels for cross supervision.However,this method still has some shortcomings.On the one hand,this method uses pseudo labels generated by the two networks due to differences in training algorithms,resulting in poor quality and inconsistent prediction output.Usually,additional uncertainty networks need to be designed to filter pseudo labels,which limits the performance of the model and increases computational complexity.On the other hand,the semi supervised monocular depth estimation method based on the encoder decoder training framework will lose a lot of information during the encoding stage,and the predicted depth map has problems of blurring object boundaries and complex texture areas.To solve the above problems,this paper proposes a semi-supervised monocular depth estimation algorithm based on consistent distillation.The main work is as follows:(1)A semi supervised monocular depth estimation method based on EMA consistent distillation and pseudo label reconstruction was designed to address the issues of poor quality of pseudo labels and inconsistent prediction output in existing semi supervised models.This method uses both the student model and the teacher model for supervised and self supervised training,and improves the consistency between the teacher model and the student model through exponential averaging algorithm,reducing the loss value of false labels and facilitating model training.We have designed a pseudo label reconstruction module based on uncertainty guidance,which uses the outputs of teacher and student models to reconstruct pseudo labels,improving the quality of pseudo labels and reducing the computational burden of model training.(2)To address the issues of object boundary blur and texture dense area depth blur during the training process of semi-supervised models,the decoder in the previously mentioned semi supervised monocular depth estimation network has been redesigned.This article introduces a channel attention mechanism to construct a boundary enhancement attention module that focuses on the depth of object boundaries;Introducing spatial attention mechanism to construct a region enhancing attention module,improving the prediction ability for texture dense areas.This article integrates these two modules into a multi-scale monocular depth decoder,focusing on the depth information of objects at different scales in the image,which can more efficiently extract object boundaries and dense area information,reducing the loss of effective information.Improve the performance of the model.This article conducted comparative experiments on the large-scale autonomous driving dataset KITTI and indoor dataset NYU-v2.The experiments showed that the proposed model achieved good performance improvement.The research work of this article is based on the vertical research project 《Development and Application of Target Recognition Technology for Unmanned Vehicles》,and the research content comes from the task of "Model Construction and Algorithm Design" in this project. |