Each pixel value of the depth map is a gray value corresponding to the actual distance between the sensor and the target object.Monocular depth recovery task refers to recovering its corresponding depth map from a single RGB image.Two dimensional RGB color image is the projection of three-dimensional space on the plane.The depth information of spatial points will be lost in the process of projection.Restoring the depth of pixels from a single RGB image is equivalent to inferring three-dimensional space from two-dimensional image.Therefore,monocular depth recovery is widely used in the fields of autopilot,augmented reality,virtual reality and3D modeling.At present,the research work based on deep learning methods and large-scale data sets has greatly improved the monocular depth recovery method,which can obtain accurate results at low cost and computational requirements.However,monocular depth recovery belongs to the problem of intensive prediction.The existing depth network greatly compresses the resolution in the process of feature extraction,resulting in the deviation and loss of depth information in the target edge region with large depth change and the region far away from the sensor.This paper studies the above problems as follows:Firstly,based on the full convolution neural network,a monocular depth recovery method based on spatial attention mechanism and transfer learning is proposed.On the one hand,the pre training model for image classification not only has high performance,but also facilitates the modification and training of the model.Therefore,this kind of pre training model is transferred to the monocular depth recovery network proposed in this paper as the depth feature encoder;On the other hand,attention mechanism is widely used in the field of natural language processing,and has a good performance in the direction of semantic segmentation in the field of computer vision.The accurate target segmentation edge has a positive impact on depth recovery.Therefore,in the network design,the spatial domain attention module is taken as the key part of the monocular depth recovery network decoder proposed in this paper.Secondly,in order to further improve the effect of depth recovery and improve the work efficiency of the model,a hybrid domain attention module combining channel domain and spatial domain is proposed to be applied to monocular depth recovery task.The hybrid domain attention module can collect remote relationships in both spatial and channel dimensions to integrate global information.Finally,the two models are trained,verified and tested on the indoor data set NYU depth v2.The experimental results are qualitatively and quantitatively compared with the current advanced research methods,which proves that the network model and training strategy proposed in this study for monocular depth recovery have slightly outstanding performance.This paper also designs an evaluation method for the edge information of visual depth map.The results show that the proposed method is obviously superior to the existing methods in some evaluation indexes.The mean squared logarithmic error of monocular depth recovery model based on spatial domain self attention reaches 0.054,the accuracy ofδ3reaches 0.994,and the depth change is smoother,while the accuracy ofδ2of monocular depth recovery model based on mixed domain self attention reaches 0.972,and the absolute relative error reaches 0.127,It can generate more accurate image contour,and the edge similarity reaches 0.7381. |