Font Size: a A A

Research On Monocular Image Depth Estimation Based On Deep Learning

Posted on:2022-02-03Degree:MasterType:Thesis
Country:ChinaCandidate:Z H LiFull Text:PDF
GTID:2518306509965089Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Image depth estimation is an important research topic in computer vision.Mining the depth information from the two-dimensional image is useful for the computer to understand the three-dimensional structure of the scene,which is widely applied in many fields such as intelligent robotics,virtual reality,augmented reality,three-dimensional reconstruction of the scene,and autopilot.Compared with binocular and multicular depth estimation methods,monocular based depth estimation methods have less requirements for equipment and environment,and are easy to implement.However,it is a huge challenge to estimate the three-dimensional information from a two-dimensional image in a single view from the perspective of geometric calculation.The traditional monocular-based depth estimation methods mainly recover depth information through clues such as shadows and textures.Although they are easy to implement,those methods are greatly limited by the external environment in the process of feature detection and matching,and are incapable in generalization.While the monocular depth estimation based on deep learning methods make full use of the object structure and other information in the image,so these methods have robust feature representation and good generalization ability.However,there are some issues in these methods such as:(a)the object boundary information is apt to be lost partly and the scene structure is restored incompletely;(b)the depth prediction performance in complex environments is not satisfactory,that is,some small-scale objects and the background are mixed together,and cannot be distinguished from each other.The former(a)is because the traditional convolutional neural network only aggregates the features of the local area through the convolution kernel,and lacks consideration of the global information relationship,while the latter(b)is because the current methods are incapable of representation of the texture and geometric information in the scene.In this paper,we develop the depth estimation based on deep learning in the following aspects:(1)By widening the neural network,the performance of the model can be further improved without increasing the complexity of the model.We adaptively adjusts the importance relationship between feature channels through weighted calibration,so as to improve the utilization of features in the network model.On this basis,a monocular image depth estimation network based on SE-ResNeXt is proposed in this paper,which improves the utilization power of global information in Network.(2)In this paper,we use a multi-level attention mechanism to effectively retain the semantic information of the scene in different parts of the network.On this basis,the semantic features extracted from the encoder network are merged with the features of the equivalent layer in the decoder network,which helps the network better locate small-scale objects.Finally,through the global feature enhancement module,more dense contextual semantic information is maintained.Therefore,a monocular image depth estimation network based on multi-level attention peer layer fusion is proposed.Our method effectively improves the depth prediction effect in complex environments,and solves the problem of mixing small-scale objects in the scene with the surrounding environment.In this paper,we systematically study the extraction of global feature information in depth estimation and the retention of detailed features in complex environments.The proposed method significantly improves the performance of depth estimation,which can be demonstrated based on a large number of experimental results and analysis.On the KITTI data set,the absolute relative error is reduced to 0.1,and the root mean square error is reduced to 4.421.The experimental results show that the developed methods in this paper obtain a lower error value,and also have better visual results.
Keywords/Search Tags:Monocular image depth estimation, Global features, Attention mechanism, Contextual semantic information
PDF Full Text Request
Related items