Research On Depth Estimation Method Of Monocular Image Scene Under Unsupervised Mechanism

Posted on:2023-08-30

Degree:Master

Type:Thesis

Country:China

Candidate:Y L Wang

Full Text:PDF

GTID:2568306818997099

Subject:Control Science and Engineering

Abstract/Summary:

PDF Full Text Request

Technologies such as autonomous driving,virtual reality,and augmented reality have developed rapidly in recent years.As an indispensable part of these technologies,depth estimation has received extensive attention from researchers.Monocular depth estimation based on deep learning is to use the ability of convolutional neural networks to extract abstract features to obtain complex depth cues on two-dimensional images,thereby avoiding the high cost of using traditional hardware devices such as lidar and millimeter wave radar.The equipment is not easy to be embedded and other issues.Unsupervised monocular depth estimation usually uses an encoder-decoder network.How to effectively fuse the high-and lowlevel feature information and reasonably utilize the depth features of objects at various scales is a challenging task.In addition,a large number of phenomena that exist in reality,such as textureless regions and occlusion,will also bring difficulties to the network.The thesis mainly analysis and research on how to effectively fuse multi-scale information and solve the problems of textureless regions.The main innovations and achievements are as follows:(1)Unsupervised monocular depth estimation based on dense feature fusion.Aiming at the problems of low feature reuse rate and insufficient fusion caused by feature fusion by using skip-connections on the same layer of the U-shaped encoder-decoder,an unsupervised monocular depth estimation based on dense feature fusion is proposed.First,design a dense feature fusion layer to fuse high-and low-level features and low-resolution disparity maps in the form of channel stacking and convolution.Then deploy the dense feature fusion layer between encoder-decoder in the form of dense connections,instead of the previous skipconnection improves the reuse rate of each layer feature.Finally,the encoder network is channel-cut to achieve the performance balance between the encoder-decoder.On the KITTI dataset,the threshold accuracy is increased to 85%,the absolute relative error is reduced to0.122,and the remaining 5 indicators are all improved.On the Make3 D dataset,the absolute relative error dropped to 0.497,and the other three indicators all improved.(2)Unsupervised monocular depth estimation based on balanced multi-scale.Aiming at the feature dilution problem caused by the feature fusion of the existing network using the Ushaped encoder-decoder and the skip-connection,the loss of spatial information caused by the feature extraction stage of the encoder,and the scale imbalance problem that often exists in the scene,an unsupervised monocular depth based on balanced multi-scale is proposed.First,the dilated convolution is added to the last two blocks of the encoder,which reduces the number of downsampling and retains more spatial details.Then a balanced multi-scale module is designed to extract multi-scale features in a pooling manner,and then proceed the balanced fusion operation,with the attention mechanism to further optimize the fusion features,obtains rich and low-redundancy balanced multi-scale information.Finally,the high resolution and large receptive field features output by the encoder are input to the balanced multi-scale module.The two cooperate with each other,which greatly improves the network performance.The threshold accuracy on the KITTI dataset is increased to 88%,the absolute relative error is reduced to0.104,and the remaining 5 indicators are all improved.On the Make3 D dataset,the absolute relative error dropped to 0.330,and the other three indicators all improved.(3)Unsupervised monocular depth estimation for large textureless regions.Aiming at the phenomenon of large-scale textureless areas caused by large areas of water and sky in the USVInland dataset.Three improvements are proposed: disparity initialization loss,horizontal gradient consistency loss,and textureless mask.First,use the textureless mask algorithm to extract the textureless area in the image.Then use the disparity initialization loss to act on the area recognized by the textureless mask,which changes the loss landscape of the entire network,so that the network is not easy to fall into local minima in the training process.Finally,the horizontal gradient consistency loss is applied to the area identified by the textureless mask to make the horizontal direction as smooth as possible.The three cooperate with each other so that the textureless area can also predict a reasonable depth.On the USVInland dataset,the threshold accuracy is increased to 64.9%,the absolute relative error is reduced to 0.37,and the other 5indicators are all improved.

Keywords/Search Tags:

monocular depth estimation, multi-scale feature fusion, dense connection, dilated convolution, unsupervised learning

PDF Full Text Request

Related items

1	Research On Unsupervised Monocular Vision Based Depth Estimation Algorithm
2	Research And Application Of Scene Depth Estimation Method For Monocular Image With Self-supervised Mechanism
3	Monocular Depth Estimation Based On Convolutional Neural Network
4	Research On CNN-based Monocular Depth Estimation
5	Unsupervised Learning Based Monocular Depth Estimation Algorithm
6	Monocular Image Depth Estimation Based On Multi-Scale Fusion And Stereo Pair Image Reconstruction
7	Research On Monocular Depth Estimation Algorithm Based On Deep Learning
8	Research On Monocular Depth Estimation Based On Deep Learning
9	Research And System Implemen Tation Of Monocular Depth Estimati On Algorithm Based On Feature Enhancement
10	Research On Robot Navigation And Positioning Technology Based On Multi-sensor Fusion