Font Size: a A A

Research On Depth Estimation Algorithm Based On Monocular Indoor Scenes

Posted on:2021-06-04Degree:MasterType:Thesis
Country:ChinaCandidate:J M ZhangFull Text:PDF
GTID:2518306548482394Subject:IC Engineering
Abstract/Summary:PDF Full Text Request
Depth estimation has always been an important research task in the field of computer vision.In the early stage,depth estimation is based on geometric constraints in the scenes to obtain depth information.Later,monocular depth estimation algorithm began to appear.Recently,with the rapid development of artificial intelligence technology,the direction of deep learning is hot,and depth estimation algorithm based on deep learning is developing rapidly.Thus depth estimation based on a single image has a wider application,such as scene ranging,robot navigation,automatic driving and so on.However,most of the current depth estimation algorithms are mainly used in outdoor scenes,while indoor scenes have no significant global or local features,and the scenes are complex,and the depth values are very dense and continuous.Therefore,the research of depth estimation algorithm based on indoor scene still needs to be broken through.Aiming at the problem of depth estimation of indoor scene for single image,we proposes an encoder-decoder structure based on multiple networks,i.e.,Fully convolutional networks(FCN)combined with Squeeze-and-Excitation networks(SENet)and Residual networks(Res Net),respectively.Our model is trained end-to-end and does not rely on any post-processing techniques.Firstly,a Fully convolutional Squeeze-and-Excitation networks module(FCSE block)is designed by encompassing FCN and SENet.According to some formulas,FCSE blocks and convolution layers are arranged alternately in the encoder.Meanwhile,high-quality feature maps are output by utilizing spatial and channel information.Then,using the characteristic of Res Net,jump-layer connection,FCN and Res Net have been encompassed,and according to some formulas,arranged in the decoder to deepen the layers of the decoder network,which can restore the depth information of the down-sampled feature maps more completely and improve the accuracy of the depth map.Our model has reduced the time spent in training.Finally,the L1 loss function is used to optimize our model.The proposed algorithm is trained and tested on the most commonly indoor dataset NYU Depth v2.The experimental results show that,compared with other existing monocular depth estimation methods,our model not only simplifies the cumbersome process of refining rough maps,but also has a higher precision of predicted depth maps,with a reduction of error rate of not less than 1.6%,an improvement of threshold accuracy of not less than 0.5%.The average running time of the network structure is 21 ms,which lays a foundation for realizing real-time detection.Finally,the scene ranging results have showed that the average error rate of the algorithm measurement results and the actual distance measurement is 4.40%.
Keywords/Search Tags:Monocular images, Indoor scenes, Depth estimation, Deep learning
PDF Full Text Request
Related items