| Scene depth information recovery is one of the important research topics in computer vision.Due to the loss of depth information,the two-dimensional image taken by ordinary camera can not fully perceive the three-dimensional structure of the scene,which is also one of the difficulties of many computer vision tasks based on the two-dimensional image.If the depth information of the scene can be accurately recovered,it will be able to assist in solving many existing computer vision tasks.Examples include simultaneous localization and mapping(SLAM),robot navigation,object detection and semantic segmentation,and so on.The depth map obtained by the sensor is sparse and the acquisition range is limited,so it is difficult to be widely used.Therefore,it is of great significance to make research on recovering the depth information of the scene from the two-dimensional image.This paper focuses on the depth estimation of monocular images.Early monocular depth estimation techniques relied on ideal environmental assumptions,so the quality of depth maps was poor.In recent years,the work has used deep learning method to obtain depth information from two-dimensional images,which has achieved great results.However,in deep learning methods,supervised learning methods are limited to the acquisition of depth ground truth.As for unsupervised learning methods,there is a problem of fuzzy scale in the depth information due to the lack of depth ground truth during the training process.In view of the above problems,this paper proposes a domain adaptation depth estimation algorithm,in this method,supervised learning without using the depth ground truth of the real scene has been adopted.And the processing methods of other visual tasks are combined into the framework of this paper to obtain a better quality depth map.The main innovations of this article are summarized as follows:1.A domain adaptive depth estimation framework with joint perceptual loss is proposed.The input image is transformed into a target image similar to the real scene image by generating an adversarial network.The pre-trained network is used to extract the high-level features of the input image and the target image.And the loss measure is carried out in the feature space to ensure that the high-level semantic features of the target image do not change.In combination with the training of perception loss and counter loss to transform network,the experimental results show that the proposed method has received good results in the monocular depth estimation task.2.To resolve the domain shift problem existing in the domain adaptation depth estimation,a method of joint training with structured perception loss is proposed.In the process of domain adaptation depth estimation,the transformed target image is susceptible to change in the geometric structure,which will cause errors on the following depth estimation network training.Therefore,a new loss function is proposed which uses the structural similarity measure function to constrain the feature layer to ensure the consistency of the geometric structure.In addition,in order to obtain a depth map with consecutive boundaries,the space gradient loss is used to inversely adjust the depth estimation network.3.On the basis of joint perception loss and domain adaptive depth estimation,the attention mechanism is integrated to optimize the depth estimation.The attention mechanism is integrated into the depth estimation network to solve the problem of interference of background information in the depth estimation process,so that the distinguishing features of depth can play a greater role.In addition,empty convolution is used to solve the serious problem of missing feature caused by too deep network.The final experimental results not only improve the overall depth and accuracy,but also obtain depth maps with better details. |