Font Size: a A A

Spatio-Temporally Coherent 4D Reconstruction From Multiple View Video

Posted on:2021-05-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y DongFull Text:PDF
GTID:1368330647957235Subject:Surveying the science and technology
Abstract/Summary:PDF Full Text Request
Four-dimensional(4D)reconstruction of scenes is one of the basic and key problems in the field of photogrammetry and computer vision,the purpose of which is to model the spatio-temporally coherent structure of dynamic scenes containing moving objects.Compared with other 4D reconstruction methods,4D reconstruction based on multi-view video has the advantages of convenient data acquisition and a wide range of application scenarios and has greater research value and application prospects.However,the current processing method requires a large amount of a priori scene knowledge for use as a constraint,which severely limits its engineering use value,and with the increase in video resolution as well as sampling frequency,the required amount of computations will increase abruptly.Therefore,it is of great theoretical significance and practical value to study efficient and robust spatio-temporally coherent 4D reconstruction methods that do not require a priori scene knowledge.To this end,this paper conducts an in-depth study on the theories related to the spatio-temporally coherent 4D reconstruction of multi-view video,analyzes in detail the existing relevant methods,proposes a series of new processing theories and methods,and uses a large amount of scene data for experimental verification and analysis.The main work and innovations of this paper are as follows.1.The basic theoretical methods and processing processes related to multi-view video 4D reconstruction are studied.Each process involved in 4D reconstruction processing is discussed in detail,and the current research status and the main problems of related technologies are analyzed,including data acquisition,feature extraction and matching,camera motion recovery,scene segmentation and dense reconstruction,and scene spatio-temporal coherence,which provide the theoretical basis for studying scene 4D reconstruction.2.A feature extraction and description method based on superpixels and optimized binary descriptors is proposed,which realizes the robust acquisition of the matching points for wide baseline or weak texture images.First,superpixel segmentation is performed on the image,the edge intersection of the superpixel is defined as the image feature point,the primary and secondary directions are used to perform local image deformation,and the random image point pair is used to calculate the binary descriptor by intensity comparison.Multiple public datasets and mainstream algorithms are used for experimental comparative analysis.The experimental results show that for different types of images,the method can effectively extract and match many matching points,and the number of correct matching points is increased by 2-5 times compared to that of the current state-of-the-art matching algorithms.At the same time,the method is applied to sparse 3D reconstruction of multi-view images,and a relatively ideal result is obtained,which provides good initial matching points for 4D reconstruction in general scenes.3.A method for eliminating mismatching points is proposed,which realizes the fast and accurate estimation of the relative pose for video sequences.First,the motion consistency principle is transformed into two discriminant criteria,which enables direct motion smoothing constraint without solving the motion smoothing function;then,the local motion consistency constraint is further transformed into a local polynomial mapping constraint on the basis of the two discriminant criteria to further improve the accuracy of the final result.Multiple public datasets and current mainstream feature extraction and matching algorithms are used to conduct experiments and compare them with a variety of current mainstream matching purification algorithms.The experimental results show that the method can meet the requirements of real-time computation,and the purification accuracy outperforms that of state-of-the-art algorithms.At the same time,a standard dataset with real pose parameters is used for analysis,which verifies that the method can meet the requirements of fast and accurate image pose solving and provide accurate poses for 4D reconstruction.4.A dense matching method of multi-view video frames,which combines deep learning and superpixel information propagation,is proposed to achieve high completeness dense reconstruction of complex scenes.First,a neural network method is used to predict the initial depth value,and then,depth plane fitting and joint optimization are combined with superpixel information.Finally,the optimal depth valuation is obtained by checking pixel-by-pixel depth information propagation.Multiple actual scene datasets are used to conduct experiments for comparison with a variety of current high-level algorithms that have been widely used.The experimental results show that the method outperforms state-of-the-art algorithms in terms of reconstruction completeness and overall score while maintaining high reconstruction accuracy and speed,which provides a good dense point cloud for 4D reconstruction.5.A spatio-temporally coherent 4D reconstruction method for general dynamic scenes without prior knowledge of the scene is proposed that realizes near real-time or online 4D reconstruction for multi-view video in general scenes.First,a modular spatio-temporally coherent 4D reconstruction process for dynamic scenes is designed.Then,a fast scene depth estimation for video data is realized by using time domain constraints,the depth and segmentation information is jointly optimized to achieve consistent processing of multi-view information,spatio-temporal coherence of scenes is realized by using point cloud fusion processing between the front and rear frames,and finally,an overall optimization is performed to realize spatio-temporally coherent 4D reconstruction for streaming asynchronous processing.A comprehensive and detailed experimental analysis is conducted using multiple types of multi-view video.The experimental results show that the method can provide a spatio-temporally coherent 4D model of a complete dynamic scene in near real-time or online without inputting any a priori camera or scene information and can provide complete correspondence for the whole scene and point cloud over time,which achieves a good 4D reconstruction of the scene.
Keywords/Search Tags:Multi-view video, spatio-temporally coherent 4D reconstruction, feature extraction and matching, mismatch removal, spatial pose recovery, multi-view dense matching
PDF Full Text Request
Related items