Font Size: a A A

Research On Monocular Depth Prediction Algorithm Based On Deep Learning

Posted on:2022-12-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y F YangFull Text:PDF
GTID:1488306764499034Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
Monocular depth estimation refers to estimating the distance information from each pixel in the scene corresponding to the image to the camera through a single RGB image.As the underlying task of 3D space vision,it provides the most basic depth information for the research of robot obstacle avoidance,automatic driving,and virtual reality.The current depth detection equipment mainly includes lidar system,millimeter wave radar system,time-of-flight camera system,binocular camera system,multi-eye camera system and structured light camera system.Affected by the external environment,or the short distance measurement,or the large amount of algorithm calculation,it is difficult to take into account the cost and effective distance using multiple depth detection devices for distance measurement.Monocular depth prediction requires only one RGB camera,and performs pixel-level depth prediction through algorithms.Since monocular depth prediction has extremely low hardware cost,it has good development and engineering prospects.At present,in the fields of autonomous driving and robot obstacle avoidance,accuracy is still the primary consideration,and the accuracy of supervised methods is higher than that of unsupervised methods.Based on this,this paper conducts a series of researches on supervised monocular depth prediction based on deep learning methods.Focusing on technical difficulties such as fusing global depth information and local depth information,refining depth contours,and lightweight algorithms,the theoretical analysis work,method research work,technical realization work and experimental verification work have been carried out.The main research contents are as follows:(1)Research on Monocular Depth Prediction Network Based on Multi-scale Ushaped Network.For how to better integrate local depth information and global depth information,for how to better use global and local features such as shape,color,texture and other features to help the network perform depth prediction,a mixed-scale Unet network framework based on dense atrous pyramids is proposed.The Unet++ network structure used in the field of image segmentation is introduced into the field of monocular depth prediction,the number of convolutional layers of the network is reset under the Unet++network framework,and the decoder part is densely connected.By choosing an appropriate atrous radius size,the transducer part between the encoder and the decoder forms a dense atrous pyramid based on different feature layers to better connect the features in the deep and shallow layers of the network.Tested on KITTI dataset and NYU Depth V2 dataset,the proposed method is superior to most methods of the same type,and has outstanding performance in the squared relative error and root mean squared error indicators.It can comprehensively integrate global and local information and improve the prediction accuracy of the network.(2)Research on Monocular Depth Prediction Loss Function Based on Boundary Constraints.Aiming at the difficulty of predicting the blurred depth map due to the sparse depth ground truth map,a strong boundary constraint loss function is proposed.The strong boundary constraint loss function consists of a weighted scale-invariant loss function,a pairwise ranking loss function,and a robust ordinal depth loss function.In the pairwise ranking loss function,a cross-edge point pair sampling method is proposed.The Sobel operator is used to extract the edge of the image,and the area near the pixel is divided into four parts by the gray gradient of the pixel in the edge line and the direction of any coordinate axis of the image.By randomly selecting any pixel point in the four regions,three groups of point pairs are formed,and the point-to-point connection line is guaranteed to pass through the boundary contour line.By setting the number of pixels in the contour line,the corresponding set of point pairs is obtained.On authoritative datasets,different network models are tested on the dataset,which proves the advanced nature of the proposed loss function.And the network model trained by the strong boundary constraint loss function was used to evaluate the selfcollected data,and achieved good results.It is verified that the proposed loss function improves the robustness of the network to the data.(3)Research on Transformer-Type Monocular Depth Prediction Network Based on Semantic Information.Aiming at the problem of how to reduce the weight of the model without affecting the prediction accuracy,a segmentation-guided Unet-Swin Transformer-like depth prediction network model is proposed.Considering that Transformer has general modeling capabilities,based on the Swin-Unet framework,a convolution residual module is introduced in the skip connection part.This move increases the depth of the network without affecting the degradation of the network,so that the network can learn the mapping relationship better.In the network decoder part,a guided nearest expanding module is proposed to upsample the patches.The guided nearest expanding module obtains the mask maps of the foreground and background by inputting the semantic filter image corresponding to the RGB image.In this way,the feature map is guided and filtered,and then sent to the convolution layer for convolution,and finally the nearest neighbor function is used for upsampling.Extensive experiments on KITTI dataset and NYU Depth V2 dataset show that the proposed network is advanced and effective.On the premise of reducing the amount of computation and network storage volume,the prediction accuracy of the network is still improved.
Keywords/Search Tags:Monocular Depth Prediction, Atrous Convolution Pyramid, Encoder-Decoder Structure, Image Segmentation, Transformer Structure
PDF Full Text Request
Related items