Font Size: a A A

Research On Single-view 3D Reconstruction

Posted on:2021-03-01Degree:MasterType:Thesis
Country:ChinaCandidate:Z WangFull Text:PDF
GTID:2428330626956036Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Since the birth of deep learning networks,new breakthroughs on many persistent im-age and graphics issues have been made.In the fields of recognition and segmentation,deep learning-based methods have been maturely applied in industry,including the prob-lem of single-view 3D reconstruction.However,even though deep learning has brought a great improvement compared to traditional methods,there remains extremely large space for research in the three-dimensional reconstruction of the single view.This paper takes single-view three-dimensional reconstruction as a research topic,focusing on the analy-sis of the current problem of single-view three-dimensional reconstruction,and proposes feasible improvement schemes to be tested and verified.The main results can be divided into three parts: the prediction of symbol distance field,attitude estimation,model trans-formation and post-processing.The prediction network of the symbolic distance field was inspired by DISN.In order to improve the accuracy of 3D reconstruction and solve the problem of limited model resolution,the explicit expressions including voxel and point cloud are abandoned and the implicit Symbol Distance Field(SDF)expression is adopted in this article.The model not only extracts the global features of the picture,but also projects the 3D points according to the internal parameters of the camera onto the 2D image to extract the local features.In addition,the ReSampler module is proposed to improve the localized feature extraction in DISN.Finally,after combining global features and local features with spatial point coding,the prediction accuracy of the symbol distance field has been significantly improved.In terms of visualization,the model has showed better competitiveness in details such as thinner surfaces and holes.In terms of the loss function,the model adopts the GroundTruth and predicted values of the symbolic distance function instead of the Chamfer Distance(CD)and Earth Moving Distance(EMD).Compared with CD and EMD as the training loss can only measure the similarity of the model approximately,SDF can accurately measure the similarity of the model.The function of the pose estimation network is to estimate the camera pose from a given image.Despite the large number of images of objects on the Internet,the number of camera posemarks are rare.To provide the camera pose information to extract the local features of the forecast network of symbol distance field,this paper proposes a simple camera prediction network based on a single image.The VGG network is adopted as the network backbone,and the training data comes from the ShapeNet Core data set.The initial attitude of the ShapeNet Core data set are taken as the reference attitude and rotated into the training data.The prediction of the internal parameters of the camera uses a more continuous 6D rotation expression,rather than the traditional quaternion or Euler angle,to accelerate the convergence of the network and improve the accuracy of regression.The main function of post-processing is to process stray outliers caused by sym-bolic distance prediction errors.Two processing methods are provided: the processing method of voxel-based nerve network adopts the common coder-decoder structure design and takes the voxel models after the forecast and reconstruction as the training samples,the real models as the GroundTruth for training.The network has solved outliers,while smoothing the model surface and enhancing details.However,as the model is performed under voxels and is affected by the voxel expression,and the resolution will be limited.The common algorithm is designed based on the breadth-first search method,which uses the association between model points and points to segment different objects in space and remove reconstruction errors.The grid or voxel data can be processed.This method has managed the scattered withdrawn reconstruction errors,but failed to process the de-tails.As the correlation between point and point under the point-cloud expression is rather weak,the above two methods do not have a point cloud-based processing solution,hence the point cloud data can be obtained by converting the processed grid data.
Keywords/Search Tags:Camera pose, 3D reconstruction, perspective projection, sign distance field, single view, autoencoder
PDF Full Text Request
Related items