| Virtual Reality(VR),as a popular interactive media format,can give users an immersive experience.However,due to the high cost of production,VR has not been popularized.Existing panoramic images or videos lack depth information and cannot realize the immersive experience of VR.If you want to have an immersive experience,you need to obtain the depth information of the scene,and then reconstruct the virtual world according to the depth information of the scene.The generation method of panoramic image depth information is the key method to realize VR.Therefore,this paper proposes a neural network-based method for panorama image depth estimation and 3D scene reconstruction.The specific work content can be divided into three parts:First,since the fisheye camera shoots a spherical image,the image needs to be projected before it can be displayed on a two-dimensional plane.Different projection methods will result in spatial distortion of spherical image.The common Equi-Rectangular Projection(ERP)process is to cut and stretch a ball along the meridian direction.The closer to the pole,the more obvious the image distortion.In order to reduce the distortion caused by ERP projection,this paper proposes a spherical convolution that can fit the panoramic image,and designs a network model based on spherical convolution---SPCNet.Experimental results show that SPCNet outperforms other networks overall.For example,RMSE is 0.419 lower than UResNet and 0.3 lower than RectNet.The threshold accuracy of depth estimation is also improved,and the accuracy of SPCNet can reach 0.994.Secondly,in order to further improve the performance of panorama image depth estimation,this paper proposes two improved schemes for the SPCNet network.The first is joint high-resolution(HR)color imageguided depth map optimization.In this paper,HR images are used as a supplementary branch to further optimize the depth map by exploiting the detail components of HR images.The second is the multi-scale fusion optimization network,which uses multiple color images as the input of the SPCNet network,so that the encoder-decoder network structure can fully learn the global features of the image and add the details of depth estimation.For two different optimization schemes,this paper compares them with the SPCNet network.The experimental results show that compared with the SPCNet network,the multi-scale fusion not only shows advantages in the detail processing of the depth map,but also outperforms in the point cloud effect.Finally,this paper adopts a bidirectional LSTM-based network model to predict the 3D layout of panoramic images.The depth information is fused with the 3D layout to reconstruct the 3D scene structure.The reconstructed results are displayed in ply format,and the reconstructed scene structure conforms to the real situation.The experimental results show that panoramic images with depth information are more advantageous in 3D scene reconstruction. |