Research On Multi-View 3D Scene Localization In Complex Scenes

Posted on:2021-03-05

Degree:Master

Type:Thesis

Country:China

Candidate:S J Liu

Full Text:PDF

GTID:2428330611480339

Subject:Information and communication engineering

Abstract/Summary:

PDF Full Text Request

In the era of the rapid development of artificial intelligence,there is an increasing demand for services such as intelligent robots,autonomous driving,indoor navigation,etc.,which has led researchers to conduct in-depth research in these areas.These fields have a common basic problem--how to localize themselves more accurately.CNN has good performance in camera localization,but it still has the problems of low accuracy and high error rate.One of the important reasons is the unified processing of the two different parameters of position and orientation.This paper proposes two end-to-end methods based on deep learning to regress the positions and orientations of the camera from color images.The main work and contributions of this article are summarized as follows:(1)A dual-stream encoder-decoder localization network(DSEDL-Net)is proposed.The design of the dual-stream structure decouples the position and orientation and solves the turbulence problem between the two.Because of the different characteristics of camera position and orientation,the network leverages the multi-task concept to predict the position and orientation separately using a dual-stream structure,thus obtaining more reliable results.We proposed a camera pose regressor using single-scale downsampling module or multi-scale aggregation module to transform the decoded features,and use the global average pooling operation to capture the spatial information of the features and reduce the information loss.(2)A scene localization network based on joint task learning(JTL-Loc Net)is proposed.DSEDL-Net completely decouples the position and orientation,but the two are not completely isolated,so JTL-Loc Net introduces the gating module of the attention mechanism which selects and transmits the information that needs to be focused on for different tasks,and this information is also a global feature that overcomes the shortcomings of the locality of convolution operations in convolutional networks and allows information to be shared between different tasks;In addition,JTL-Loc Net adds auxiliary task branches on the basis of DSEDL-Net,which improves network performance.Auxiliary task branches(such as crop coordinates,rotation angle,or scaling factor)are embedded after the position decoder.For small-scale data sets,auxiliary tasks can be regarded as a regularization term in the network,which provides a priori knowledge by adding constraints to reduce the hypothesis space and accelerate the convergence of the network.(3)A large number of experiments on challenging public indoor and outdoor scene datasets prove the effectiveness of the proposed method.On the indoor Microsoft 7-Scenes dataset,the average position and orientation errors of DSEDL-Net compared to the "Pose Net" method are reduced by 47.7% and 21.5% respectively.compared with the "LSTM-Pose" method,the average position and orientation error of JTL-Loc Net are reduced by 32.3% and 36.5% respectively.On the outdoor Cambridge Landmarks dataset,the average pose error of proposed JTL-Loc Net was reduced by 44% and 64% compared to "Pose Net".In summary,the two networks proposed in this paper have achieved good results on open indoor and outdoor datasets,proving the feasibility and effectiveness of the method proposed in this paper for multi-view 3D scene positioning tasks.

Keywords/Search Tags:

Deep learning, Convolutional neural network, Scene localization, Camera pose estimation, Encoder-decoder networks

PDF Full Text Request

Related items

1	Research On Camera Pose Estimation Method Based On Deep Neural Network
2	Relative Camera Pose Estimation Us-Ing Deep Networks
3	Deep Learning Based Key-point Localization Algorithms
4	Research On Object Six-degree-of-freedom Pose Estimation Method Based On Improved Fully Convolutional Neural Network
5	Research On Facial Pose Estimation And Landmarks Localization Based On Deep Learning
6	Head Pose Estimation And 3D Face Reconstruction Using Convolutional Neural Networks
7	Research Of Human Pose Estimation Method Based On Convolutional Neural Network
8	Rasearch On Camera Pose Estimation Based On Unsupervised Learning
9	Visual Data Understanding Based On Deep Encoder-Decoder Framework
10	Research On End-to-end Scene Text Recognition Method Based On Deep Learning