Font Size: a A A

Video Scene Reconstruction And Enhancement

Posted on:2010-10-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:G F ZhangFull Text:PDF
GTID:1118360302958543Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The key of mixed reality is how to employ the computer techniques to effectively express and integrate the virtual and real worlds, allowing high feedback interaction. As the object complexity increases, it becomes more and more challenging to directly model and render objects to achieve realistic visual effects, due to intractable computational cost and manpower requirement. Since the images and videos can be easily captured from the real world, and the computer vision technique can help extract and reconstruct the computation models which are consistent with the human visual perceptions, it can effectively remedy the limitations of the traditional computer graphics using ideal mathematical models. Therefore, in recent years, computer vision and computer graphics have been integrated.For these reasons, this thesis mainly focuses on how to recover and reuse the motion and 3D information from the real captured video data, including camera tracking, dense depth recovery, optical flow estimation, and video segmentation. With these techniques, the general difficulties in video editing and processing, such as maintaining the geometry and illumination coherence, and handling occlusions, can be effectively addressed. It also advances the crossing and interaction between computer vision and computer graphics. In summary, the main contributions of this thesis are listed as follows:We propose an efficient and robust video-based camera tracking framework. In order to efficiently and reliably handle long sequences with varying focal length, our method advances Structure-from-Motion (SFM) from several factors. First, a novel initial frame selection is proposed, which can make SFM initialization reliable. Second, we monitor the accumulation error and select an appropriate moment to upgrade the projective reconstruction to a metric one before the accumulation error damages the self-calibration. Third, a local on-demand scheme in bundle adjustment is employed, which dramatically accelerates the computation. In addition, in order to address the reconstruction drift problem for loop-back sequences, we propose an efficient non-consecutive feature tracking, which can rapidly recognize and join the common features scattered over different subsequences. It can effectively improve the reconstruction quality of SFM, and address the drift problem. Based on these work, we develop a complete camera tracking system which can automatically recover the camera parameters with sparse 3D points from videos and film sequences. Especially, our system can handle long sequences with varying focal length in a robust way, which outperforms the state-of-the-art commercial software "Boujou Three".We propose a novel method for recovering high-quality depth maps from a video sequence. We introduce a bundle optimization framework which models the matching ambiguities with multiple frames in a statistical way. This framework effectively addresses the major difficulties in stereo reconstruction, such as image noise, occlusions and outliers, and can produce sharp and temporal consistent object boundaries among different frames. In addition, a multiple-pass belief propagation algorithm is introduced, which can effectively extend the depth levels to increase the depth precision in global optimization without introducing much computational overhead. The recovered high-quality dense depth maps can facilitate many related applications, and lay a solid foundation for complex video editing and processing.We propose an automatic and robust approach to synthesize stereoscopic videos from ordinary monocular videos acquired by commodity video cameras, which can significantly facilitate the making of stereoscopic videos. Instead of recovering the depth maps, the proposed method synthesizes the binocular parallax in stereoscopic video directly from the motion parallax in monocular video. The synthesis is formulated as an optimization problem via introducing a cost function of the stereoscopic effects, the similarity, and the smoothness constraints. With the optimization, convincing and smooth stereoscopic video can be synthesized.We propose a semi-automatic video editing framework for creating spatio-temporal consistent and visually appealing re-filming effects. With this framework, we can utilize lots of available video clips. By analyzing and estimating the 3D geometry, motion and layer information contained in the video data, we can effectively address the difficulties of re-filming, such as maintaining the coherence of geometry and illumination, and handling occlusions. Based on dense depth recovery and optical flow estimation, we introduce an efficient algorithm which can effectively detect and accurately extract the moving object from a video sequence taken by a hand-held camera. We also introduce a fast semi-automatic layer separation algorithm for static scene based on dense depth recovery. These tools can be utilized to produce a variety of visual effects in our system, including but not limited to video composition, camouflage effect, bullet-time, depth-of-field, and fog synthesis.
Keywords/Search Tags:camera tracking, depth recovery, optical flow estimation, video segmentation, mixed reality, video editing, video enhancement
PDF Full Text Request
Related items