Font Size: a A A

Research On Monocular Depth Estimation Based On Feature Fusio

Posted on:2024-09-23Degree:MasterType:Thesis
Country:ChinaCandidate:G J HuFull Text:PDF
GTID:2568307106475714Subject:Electronic information
Abstract/Summary:PDF Full Text Request
Estimating intensive depth based on a single color image is a hot research direction in computer vision and plays a key role in real life.For example,the support of this technology is needed in intelligent applications such as automatic driving,intelligent robots and three-dimensional scene reconstruction.In recent years,technical breakthroughs in the fields of natural language NLP,image classification,object detection and segmentation in computer vision have not only promoted the development of deep learning,but also promoted the research on depth estimation.This paper focuses on the research of monocular depth estimation.With Transformer becoming popular in image classification,object detection,segmentation and other aspects of computer vision,many researchers have tried to apply Transformer to the research of depth estimation.This paper attempts to integrate Transformer with a traditional convolution model,achieving performance improvements in the publicly available automated driving dataset KITTI and the NYU indoor dataset NYU_v2,as well as validation of enhanced model generalization capabilities on additional SUNRGBD datasets.The data enhancement method is optimized to improve the prediction performance of the model and the loss function is improved to improve the prediction effect of the model on the remote target.The main innovations are as follows:(1)A new monocular depth estimation model structure based on depth information perception is proposed.The label depth map is added into the training process of the model as an input.The purpose is to use pixel-level divergence loss PKL constraint model in the training stage to perceive the hidden features of the true depth from a single color image in the coding stage,so as to enhance the similarity between the coding features and the target depth map and improve the final prediction effect of the model.On the other hand,it is a process of blending the encoding features of depth map with those of color image.(2)A monocular depth estimation model structure based on dual coding and decoding is proposed.The encoding part of this structure consists of Transformer and CNN encoders,which are used to extract the global features and local features of the color image respectively.The previous depth estimation model structure is generally single encoding--decoding,in which single encoder can only obtain global features or local features,but the encoder can obtain more adequate high-level semantic information for the downstream depth estimation task,significantly improving the prediction performance of the model and enhancing the generalization ability of the model.In addition,a feature fusion module without additional parameters is designed.(3)A new loss function,scale invariant loss with weight(SILoss-weight),is proposed.In this paper,by observing the evaluation results under different depth ranges,it is found that the depth estimation effect of the model for distance pixels is far worse than that for short distance pixels.By analyzing the depth distribution of the whole data set,it is speculated that this is caused by the extreme imbalance in the number of distance pixels and short distance pixels in each color image.Therefore,this paper proposes a new loss function to increase the gradient value of long-distance pixel training by giving more weight to the depth of large values,so as to solve the unbalanced distribution of depth values,and effectively improve the prediction effect of the model on long-distance pixels.In addition,the short-range prediction effect of this loss is not worse than that of the original loss function,and even slightly improved.(4)A new data enhancement method,Cut Depth-mask,is proposed.In this paper,a random area in the input color image is selected to be replaced by a label depth map in the data preprocessing stage.However,in the actual application,it is found that there are a lot of missing values in the pre-obtained label depth map,which is not friendly to the training process of the model,and even affects the quality of the depth map predicted by the model.Therefore,a masked data enhancement method is designed in this paper: the depth pixel value of the original missing value is retained as the original color image pixel value,and the rectangular box of the replacement area functions in both vertical and horizontal directions.This data enhancement method has the following two advantages: one is to effectively solve the problem of missing values;Second,it greatly improves the richness of the input image of the model and helps to enhance the generalization ability of the model.Third,it only occupies very little training cost in training stage.In order to fully verify the effectiveness of the proposed method,two different types of experimental data sets were selected: outdoor data set and indoor data set.Firstly,outdoor data set KITTI and indoor data set NYU_V2 are selected for training and testing.A large number of experimental results show that the proposed model is superior to other current mainstream methods.Secondly,in order to reflect the improvement of the generalization ability of the model in this paper,SUNRGBD,which is also an indoor data set,is selected for evaluation test.The test model is the model trained under NYU_V2.
Keywords/Search Tags:monocular depth estimation, depth information perception, dual coding structure, loss function, data enhancement
PDF Full Text Request
Related items