As an important part of computer understanding of the three-dimensional geometric relationship of scenes,image depth estimation is a hot topic in the field of computer vision.Monocular image depth estimation is actually an ill-posed problem,so researchers paid little attention to it in the past.However,with the development of deep learning,monocular image depth estimation has become a hot research topic.Based on the existing research,this thesis proposes a monocular image depth estimation method based on deep convolutional neural network(DCNN),which improves the previous methods in three aspects: network model,model training method and loss function.The accuracy of the monocular image depth estimation is effectively improved.Firstly,according to the fact that the image depth estimation is an intensive prediction task,this thesis designs a DCNN model consisting of three parts: encoder,multi-scale feature extractor and decoder.The encoder extracts the abstract features in the image by using convolution and downsampling;the multi-scale feature extractor further extracts the multiscale features of the encoder output feature map by atrous convolution with different expansion rates;the decoder uses deconvolution to upsample the output of the multi-scale feature extractor so that the final output depth map has the same resolution as the input image.Compared with the typical encoder-decoder model,this thesis uses a multi-scale feature extractor to replace part of the downsampling and upsampling operations,so it can retain more details and reduce the number of parameters of the network,thus improving the prediction accuracy and algorithm efficiency of the networks.Secondly,considering the characteristics of the sparseness of the real depth map of the outdoor scene,this thesis adopts a semi-supervised learning method to train the above network model.On the one hand,the real depth map is used as the training label for supervised learning;on the other hand,binocular stereo vision principle is used to transform image depth estimation problem into image reconstruction problem for unsupervised learning.Compared with supervised learning or unsupervised learning,this method not only ensures the accuracy of estimation results,but also reduces the requirement of pixel integrity and density of real depth map in the training process of the network model.Finally,this thesis designs a loss function,which is obtained by linear weighting of depth estimation loss,image reconstruction loss and depth map smoothing loss.The depth estimation loss corresponds to supervised learning.It uses an adaptive Huber loss to calculate the log-domain error between the real depth map and the predicted depth map;the image reconstruction loss corresponds to unsupervised learning,which calculates the similarity error between the reconstructed image and the real image;the depth map smoothing loss is based on the continuity of scene depth,it calculates the gradient error of log-domain of depth map with input image edge information as penalty factor,and the edge information is retained while the noise in depth map is eliminated.Experimental results show that the depth estimation rate of this method is 18.9 ms/frame,and the average absolute relative estimation error under KITTI data set is less than 10%.In addition,the experimental results of the existing monocular image depth estimation algorithm are compared in this thesis.From the perspective of seven performance evaluation indicators such as RMSE and average absolute relative error,the accuracy of depth estimation can be significantly improved by using the method of this thesis. |