Font Size: a A A

Deep Learning Based Depth Estimation From Monocular Image

Posted on:2018-05-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y LiFull Text:PDF
GTID:2348330512487403Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
3D scene analysis is an import topic in the field of computer vision,and depth estimation is an important component of understanding 3D geometric relations within a scene.The fusion of reasonably accurate depth information has been proved to improve many others computer vision tasks with respect to the RGB-only counterpart,for example in semantic segmentation,pose estimation and object detection.Traditional methods of estimating depth from monocular image enforce optical geometric constraints or some environmental assumptions,such as Structure from Motion(SFM),focus or variations in illumination.In absence of such constraints or assumptions,however,to develop a computer vision system capable of accurately estimating depth maps by exploiting monocular cues is a challenging task.There are two difficulties in this task.One is that a common computer vision system can't extract information used for inferring 3D structure from monocular image like human brain.The other is that the task is a technically ill-posed problem: a 2D image may correspond to an infinite number of possible real world scenes.The inherent ambiguity of mapping a single image into a depth map determines that vision models can't estimate accurate depth value from a single image.To address these two problems,we propose following two methods respectively.Firstly we propose a computer vision model which combines deep convolutional neural networks(CNN)and continuous conditional random fields(CCRF)in a unified deep learning framework.CNN can extract rich related features from image,and CCRF can optimize the output of CNN according to the position and color information of the image pixels.Secondly,we propose a vision model to fuse sparse known labels.By using these known labels,we can provide our model a few relatively accurate depth value as a reference,and largely narrow down the range of reasonable depth value of other pixels.Thus the model can reduce the ambiguity of mapping a single RGB image into depth map to some extent.To summarize,we provide the latest developments of depth estimation from monocular image,including related datasets,research methods and results.We analyze the existing problems,and discuss the future research directions.Specifically,we propose a computer vision model that learns depth information representation from monocular image.Considering the ill-posedness of this problem,we propose a vision model to fuse sparse known labels,and it reduces the ambiguity of mapping a single monocular image to depth map.Moreover,the results on NYU Depth v2 dataset verified the effectiveness and superiority of the two models.
Keywords/Search Tags:Depth estimation, Deep learning, Scene understanding, Conditional random field, Sparse known labels
PDF Full Text Request
Related items