| Depth estimation has long been an important underlying task in the field of computer vision.Predicting the depth value of each pixel is of great help for a variety of downstream tasks in the real world,such as vehicle autopilot,scene 3D reconstruction,augmented reality and robot operation.However,directly estimating the depth from a single or multiple images taken by the production end faces many challenges,which includes how to quickly and effectively predict the accurate depth,how to train the neural network in the absence of a large number of real training data sets,and how to efficiently use the depth information to assist the downstream tasks.The in-depth study of these tasks is of great significance for the development of relevant scientific fields and various applications in industry.The content of this dissertation focuses on the improvement of depth estimation algorithm based on image vision technology,and the performance improvement of using depth estimation algorithm to assist other downstream computer vision tasks.In the research direction of depth estimation algorithm,this dissertation conducts research in two major directions,which includes supervised depth estimation and unsupervised depth estimation algorithm.Supervised depth estimation algorithms train a monocular or stereo depth estimation network to predict accurate depths based on accurate depth images obtained by external devices such as depth cameras as supervision.The problem faced by supervised deep learning is how to improve the accuracy and efficiency of the algorithm.However,it is a very difficult challenge to obtain real and reliable annotation data with a large amount of depth information in the real world.This is because that not only obtaining those data may cost a lot of human and material resources,but also it cannot precisely acquire due to the physical limitations of the current depth camera itself.Therefore,it is of great significance for science and industry that implementing unsupervised depth estimation algorithms by self-visual constraints between adjacent or stereo images.In addition to the improvement of the depth estimation algorithm model and the training,this dissertation also conducts in-depth research and exploration on how to use depth information to extract features from depth information to assist other downstream tasks.Specifically,this paper presents a detailed study on how to improve the accuracy of,for example,depth information in the downstream task of object pose estimation.The main research contents and innovations of this dissertation are as follows:1.The problem of unsupervised monocular depth estimation based on a selfdistillation training approach is investigated.A two-stage unsupervised monocular depth estimation algorithm is first proposed to address the problem that the traditional one-stage scheme of the unsupervised monocular algorithm cannot predict the accurate depth efficiently.The first stage uses a traditional full convolutional network to predict the depth of monocular images,and the second stage uses the depth predicted in the first stage to improve it to obtain a fine depth map.Second,a self-distillation loss function is proposed to address the problem of lack of reasonable supervision in the self-supervised two-stage network.The self-distillation loss simulates the supervision of the traditional distillation algorithm,but unlike the traditional distillation algorithm that uses independent teacher and student networks,our self-distillation loss treats the entire two-stage network as a teacher network and uses the predicted depth value of the teacher network as the predicted depth value of the first stage of the supervision constraint.In the proposed self-distillation algorithm,the teacher network can use the rough prediction of the student network as the initial result to obtain a better estimate,and the student network can be further improved under the supervision of the teacher network.Experiments show that the self-distillation unsupervised monocular depth estimation algorithm proposed in this paper can improve the performance of the algorithm without increasing the time spent by the network and the model parameters in the inference stage.2.The problem of using the properties of unsupervised monocular and stereo depth networks to mutually promote the accuracy of two depth estimation networks is investigated.To address the problem that self-supervised monocular depth estimation has difficulty in estimating accurate depths using the properties of stereo camera systems,a mask-aware distillation module is proposed that uses the predictions of the stereo depth network as a structure for training pseudo-label-supervised monocular depth estimation networks.Although the stereo depth network is also trained in an unsupervised manner,the predicted depths are much better than the results of direct unsupervised training,improving the accuracy of monocular depth estimation.In response to the poor performance of unsupervised stereo depth estimation in the occlusion region,an occlusion-aware fusion module combining monocular depth estimation and stereo depth estimation is proposed to improve the depth estimation accuracy.The monocular depth estimation algorithm predicts better in the occluded region due to parallax occlusion in stereo images than the stereo algorithm,while in the unoccluded region,the stereo algorithm outperforms the monocular depth prediction network.The experiments show that the mutually beneficial training method of mono-and stereo depth estimation proposed in this paper can improve the performance of both without adding additional training volume.3.The problem of establishing dense correspondence for estimating object poses using depth information is investigated.An algorithm is proposed to convert the depth information into a point cloud and then extract point-by-point cloud features to address the problem that deeper features cannot be obtained by relying on image information alone.The computed point cloud features are used together with the image features for the calculation of object pose estimation.To address the problem that relying on only a few key points cannot cope with the 6Dof pose estimation of complex scenes such as occlusion,a 6Dof pose estimation algorithm based on dense correspondence is proposed.The algorithm first uses the combined point cloud and image features to predict the corresponding points of the object model for each pixel to establish a dense point correspondence pair,and then directly uses the least squares algorithm to calculate the 6Dof pose of the object.Experiments show that the depth information and dense correspondence based 6Dof pose estimation method can improve the accuracy of the 6Dof pose estimation algorithm in the case of severe occlusion.For the content of the appeal research,the various algorithms we proposed have been recognized in the field,which will promote the further improvement of algorithms and applications of depth estimation. |