Deep Learning Based Monocular Scene Depth Estimation Algorithm

Posted on:2021-04-02

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Z Y Zhang

Full Text:PDF

GTID:1488306755460354

Subject:Control Science and Engineering

Abstract/Summary:

PDF Full Text Request

Monocular depth estimation is a fundamental problem in computer vision.As it is a key problem in many high-level 3D vision tasks and has prospective applications,it has been receiving more and more attention.Mono-depth estimation aims to predict the depth map from a single image or video frame.Due to the complex real-word scenes and the lack of robust geometric constraints,existing monocular depth estimation methods may fail to predict the fine details and precise scale of the depth map.To address such problem,this thesis proposes several new depth estimation or correlated multi-task learning methods,through efficiently designing deep neural network models and learning frameworks.The main contents of this work are given as follows:(1)A novel deep Hierarchical Guidance and Regularization(HGR)learning framework is proposed for end-to-end monocular depth estimation,which well integrates a hierarchical depth guidance network and a hierarchical regularization learning method for fine-grained depth prediction.The two properties in our proposed HGR framework can be summarized as:(a)the hierarchical depth guidance network automatically learns hierarchical depth representations by supervision guidance and multiple side conv-operations from the basic CNN,leveraging the learned hierarchical depth representations to progressively guide the upsampling and prediction process of upper deconv-layers;(b)the hierarchical regularization learning method integrates variouslevel information of depth maps,optimizing the network to predict depth maps with similar structure to ground truth.Comprehensive evaluations over two public benchmark datasets(including NYU Depth V2,KITTI and Make3 D datasets)well demonstrate the state-of-the-art performance of our proposed depth estimation framework.(2)Most existing monocular depth estimation works including various geometric or network learning methods lack of an effective mechanism to preserve the cross-border details of depth maps,which yet is very important for the performance promotion.In this work,we propose a novel end-to-end Progressive Hard-mining Network(PHN)framework to address this problem.Specifically,we construct the hard-mining objective function,the intra-scale and inter-scale refinement subnetworks to accurately localize and refine those hard-mining regions.The intra-scale refining block recursively recovers details of depth maps from different semantic features in the same receptive field while the inter-scale block favors a complementary interaction among multi-scale depth cues of different receptive fields.For further reducing the uncertainty of the network,we design a difficulty-ware refinement loss function to guide the depth learning process,which can adaptively focus on mining these hard-regions where accumulated errors easily occur.All three modules collaborate together to progressively reduce the error propagation in the depth learning process,and then boost the performance of monocular depth estimation to some extent.We conduct comprehensive evaluations on several public benchmark datasets(including NYU Depth V2,KITTI and Make3D).The experiment results well demonstrate the superiority of our proposed PHN framework over other state-of-the-arts for monocular depth estimation task.(3)In this paper we propose a novel Task-Recursive Learning(TRL)framework to jointly and recurrently conduct three representative tasks therein containing depth estimation,surface normal prediction and semantic segmentation.TRL recursively refines the prediction results through a series of task-level interactions,where one-time cross-task interaction is abstracted as one network block of one time stage.In each stage,we serialize multiple tasks into a sequence and then recursively perform their interactions.To adaptively enhance counterpart patterns,we encapsulate interactions into a specific Task-Attentional Module(TAM)to mutually-boost the tasks from each other.Across stages,the historical experiences of previous states of tasks are selectively propagated into the next stages by using Feature-Selection unit(FS-Unit),which takes advantage of complementary information across tasks.The sequence of task-level interactions is also evolved along a coarse-to-fine scale space such that the required details may be refined progressively.Finally the task-abstracted sequence problem of multi-task prediction is framed into a recursive network.Extensive experiments on NYU-Depth v2 and SUN RGB-D datasets demonstrate that our method can recursively refines the results of the triple tasks and achieves state-of-the-art performance.(4)In this paper,we propose a novel Pattern-Affinitive Propagation(PAP)framework to jointly predict depth,surface normal and semantic segmentation.The motivation behind it comes from the statistic observation that pattern-affinitive pairs recur much frequently across different tasks as well as within a task.Thus,we can conduct two types of propagations,cross-task propagation and task-specific propagation,to adaptively diffuse those similar patterns.The former integrates cross-task affinity patterns to adapt to each task therein through the calculation on non-local relationships.Next the latter performs an iterative diffusion in the feature space so that the cross-task affinity patterns can be widely spread within the task.Accordingly,the learning of each task can be regularized and boosted by the complementary task-level affinities.Extensive experiments demonstrate the effectiveness and the superiority of our method on the joint three tasks.Meanwhile,we achieve the state-of-the-art or competitive results on the three related datasets,NYUD-v2,SUN-RGBD and KITTI.(5)Online depth learning is the problem of consistently adapting a depth estimation model to handle a continuously changing environment.This problem is challenging due to the network easily overfits on the current environment and forgets its past experiences.To address such problem,this paper presents a novel Learning to Prevent Forgetting(LPF)method for online mono-depth adaptation to new target domains in unsupervised manner.Instead of updating the universal parameters,LPF learns adapter modules to efficiently adjust the feature representation and distribution without losing the pre-learned knowledge in online condition.Specifically,to adapt temporal-continuous depth patterns in videos,we introduce a novel meta-learning approach to learn adapter modules by combining online adaptation process into the learning objective.To further avoid overfitting,we propose a novel temporal-consistent regularization to harmonize the gradient descent procedure at each online learning step.Extensive evaluations on real-world datasets demonstrate that the proposed method,with very limited parameters,significantly improves the estimation quality.

Keywords/Search Tags:

Depth Estimation, Surface Normal Estimation, Semantic Segmentation, Scene Understanding, Deep Neural Network, Multi-task Learning, Online Learning, Affinity Learning

PDF Full Text Request

Related items

1	Scene Understanding Of Government Service Robot Based On Deep Learning
2	Pixel-wise Scene Understanding Based On Fully Convolutional Networks
3	Research On Key Technologies Of Monocular SLAM Based On Deep Learning Method
4	Deep Learning Based Depth Estimation From Monocular Image
5	Deep Learning Driven Scene Analysis And Semantic Target Analysis
6	Depth Estimation From A Monocular Image
7	Research On Image Segmentation Based On Multi-task Learning Deep Neural Networks
8	Monocular Depth Estimation From Image Sequence Based On Deep Learning
9	Image Segmentation And Its Applications In Image Depth Estimation
10	Monocular Image Depth Estimation Based On Deep Learning