Font Size: a A A

Depth-based Object Segmentation and Tracking from Multi-view Video

Posted on:2012-08-28Degree:Ph.DType:Thesis
University:The Chinese University of Hong Kong (Hong Kong)Candidate:Zhang, QianFull Text:PDF
GTID:2458390008995771Subject:Computer Science
Abstract/Summary:
Automatic and robust object segmentation and tracking are very important prerequisites in many applications such as object editing, recognition and surveillance. Multi-view video which captures the same real-world scene from difference perspectives is capable of reconstructing the depth perception and characterizing the visual object and dynamic scene with three-dimensional (3D) interpretation, which is superior to the traditional two-dimensional (2D) representation in terms of visual experience. In this thesis, we present several technologies in the multi-view video processing, i.e., depth reconstruction, object segmentation and tracking from multiple cameras.;As the first step in many multiocular systems, data acquisition using various multi-camera systems is discussed for different applications such as scene analysis and rendering, 3D television and free-viewpoint television. We develop a multi-camera system with five synchronized cameras and the associated control unit for data capture, transmission and storage. Following the acquisition of data, pre-processing including the camera calibration, color equalization and correction of geometric distortion is performed.;Next, we describe the dense stereo matching approaches for the both narrow-baseline and wide-baseline multi-view images. Depth information is reconstructed from multiple disparity maps. For the narrow-baseline stereo matching, a discontinuity-preserving regularization algorithm is proposed which directly couples the disparity estimation and occlusion reasoning. Wide-baseline stereo matching is an extension of the narrow-baseline case, and the algorithm utilizes a coarse-to-fine strategy to propagate the sparse matching in the coarse stage and constrains a local search in the finer stage. We evaluate the subjective performance of the matching algorithms using the narrow-baseline images, as well as the wide-baseline stereo pairs both in identical and different scales.;With the availability of depth information, we then develop algorithms to separate multiple objects in the initial frame of the narrow-baseline video, and simultaneous segment objects from wide-baseline images. To segment multiple objects in the initial frame of narrowbaseline video, we consider both spatially separated and overlapped human objects. Firstly, a saliency-based visual attention model is built for automatic object detection and extraction in the key-view image, where the saliency map is calculated by incorporating higher-level visual features, and the initial object-of-interests (OOIs) are extracted by the saliency map analysis. Based on the extracted initial OOIs, the object segmentation algorithm is formulated as a graph cut-based energy minimization problem. To segment the multiple isolated objects in the clutter background, a modified energy function is proposed by integrating color, motion, depth and occlusion features, and multiple objects segmentation is decomposed into several sub-segmentation problems and solved by the bi-label graph cut for energy minimization. With multiple overlapped human objects, adaptive background penalty with occlusion reasoning is developed and multiple features are utilized to segment individual object from a group.;To simultaneously segment a object from wide-baseline images, the saliency map is calculated by utilizing depth and localization cues. We then construct a 3D graph to enforce the depth smoothness and silhouette consistency. Additionally, local background modeling, and adaptive data fusion are proposed to achieve better results. Good performance of the proposed extraction and segmentation algorithm is attested by implementing on self-recorded and others' images.;Lastly, we introduce a tracking technique to follow the trajectory and update the connected regions of multiple separated and overlapped human objects across the video frames. In the simple case, the separated objects are tracked by motion compensation and uncertainty validation. In more complex situations, to track the multiple overlapped objects experiencing inconsistency and severe occlusions, motion occlusion as layer transition modeling is proposed to handle the accumulated compensation and segmentation errors and improve the performance using the simple tracking strategy. Quantitative and qualitative experimental results are provided to demonstrate the accuracy and robustness of the proposed tracking algorithm. Excellent segmentation and tracking results on self-recorded videos and others' sequences, as well as quantitative comparison with a state-of-the-art technique show the algorithm's superiority.
Keywords/Search Tags:Object, Video, Depth, Multi-view, Multiple, Algorithm
Related items