Font Size: a A A

Research On Key Technologies Of 3D Vision

Posted on:2019-11-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y L FengFull Text:PDF
GTID:1368330611492949Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Computer Vision aims to enable computer to understand the complex environment around them.It is a valuable research area.However most of existing computer vision system only use RGB information as input,e.g.intelligent transportation system and video surveillance system.The absence of depth information impact the robustness and reliability of the system.With the development of computer vision,it becomes a part of humans daily life which change the way we live and improve the quality of our life.The robustness and reliability of computer vision system becomes more and more important.To strength the system,this paper aims to study 3D stereo vision which include two part:vision perception and vision cognition.In particular,in vision perception we focus on how to obtain high quality depth information,e.i.,stereo matching and depth completion;in vision cognition we focus on object detection with depth information.The main contributions of this paper are as follows:1.For stereo matching,we propose three approaches.1).We propose an efficient convolutional neural network to measure how likely the two patches matched or not and use it to compute the stereo matching cost.Our architecture uses large image patches which make the result more robust to texture-less or repetitive textures areas.2).We propose a novel one stage network that exploits reconstruction error as skip connection for disparity estimation.Reconstruction errors are obtained by calculating absolute difference between the features of the left image and the back-warped features of the right image using early predicted disparity maps.3).We present an end-to-end trainable convolution neural network for stereo matching which involves both initial disparity estimation and disparity refinement tasks,which are accomplished by two tightly-coupled sub-networks,i.e.,initial disparity estimation sub-network(DES-net)and disparity refinement sub-network(DRS-net).The refinement task is done by reusing thefeatures from DES-net to calculate reconstruction error,which is fed to DRS-net to learning residual of initial disparity with respect to ground-truth.We evaluate our methods on several challenge datasets,and show that they achieves state-of-the-art performance on all the datasets.2.For depth completion,we propose a novel encode-decode architecture with sparse convolution operation and image guide information.In the encoder part,we have two branches,one branch use sparse convolution layer to handle sparse depth input which makes the output invariant to the sparsity level,another branch use standard convolution layer to handle RGB input to extract RGB feature as guide information.The two branches merge an the begining of the decoder part,and the sparse depth information and dense RGB information fused at multi-scale which improve the quality of depth completion.3.For object detection,we propose two approaches.1).We propose several geometrical features suited for autonomous driving and integrate them into state-of-the-art general proposal generation methods.Specially,we formulate the integration as a feature fusion problem by fusing the geometrical features with existing proposal generation methods in a Bayesian framework.2).We propose a new geocentric embedding for depth images that encodes depth to the camera,the structural edges and height above ground-plane for each pixel named DEH channels.We demonstrate that this geocentric embedding works can be use to generate high quality object proposals with convolutional neural networks.
Keywords/Search Tags:Vision perception, Vision cognition, Stereo matching, Depth completion, Object detection, Deep learning
PDF Full Text Request
Related items