Font Size: a A A

Monocular Depth Estimation From Image Sequence Based On Deep Learning

Posted on:2021-05-22Degree:MasterType:Thesis
Country:ChinaCandidate:H S GaoFull Text:PDF
GTID:2518306476952719Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
As a basic research topic of computer vision,depth estimation has been applied to wideranging fields such as autonomous driving,3D videos and augmented reality.Monocular depth estimation has the characteristics of low cost and extensive applicability,and therefore,in recent years,it has become a research hotspot.At this stage,deep learning technology has made significant progresses in many fields,for instance,image classification,object detection and semantic segmentation.With the help of deep models' powerful ability of feature expression,many researchers have designed a series of end-to-end monocular depth estimation algorithms and have obviously improve the performance.However,existing methods still have some problems.Most of these algorithms follow the static environment assumption and ignore moving objects in the scene,which greatly limits their capability to predict monocular depth.In this paper,unsupervised monocular depth estimation algorithms based on deep learning are explored which use single-view image sequences as training data.The major contributions of this article are shown below:(1)Application scenarios and acquisition methods of depth information are presented in this paper.It introduces the research status and developments of depth estimation at home and abroad including current classic methods.The research difficulties of this field at this stage are also summarized.(2)Aiming at static scenes,a multi-task deep learning framework is proposed to estimate the monocular depth and camera pose at the same time.In training process,with the help of pinhole camera model,the loss function of this model originates from the geometrical consistency of image sequences.(3)To adapt to the more general dynamic scenes,an additional module to predict optical flow of image sequences in unsupervised manner is added to the existing model.Thereby the pixels of dynamic objects in the images could be detected by comparing the predicted full flow and rigid flow resulted from camera's ego-motion,which mitigates the influence of moving objects in the scene.In addition,different from mainstream CNN-based model to predict optical flow,the module proposed in this paper is designed as a generative adversarial network.The architecture based on adversarial learning has the ability to learn from data distribution directly,which contributes to the promotion of optical flow prediction in robustness and accuracy.(4)To evaluate its performance in depth estimation,optical flow prediction and camera motion estimation,the proposed multi-task learning method would be compared with current advanced algorithms of corresponding fields.The final experimental results show that the proposed model achieves comparable performance with supervised ones,and outperform unsupervised ones in depth estimation and camera motion estimation.Furthermore,a series of ablation experiments are carried on to verify the rationality and effectiveness of improvements on the model.
Keywords/Search Tags:Monocular depth, Unsupervised learning, Deep learning, Optical flow, Multi-task learning
PDF Full Text Request
Related items