Font Size: a A A

Depth Estimation Of Monocular Video Using Non-parametric Fusion Of Multiple Cues

Posted on:2015-04-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y M MoFull Text:PDF
GTID:2298330467455747Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the rapid growth in demand for three-dimensional (3D) videos, the problem of3Dconversion from two-dimensional (2D) video has become one of new research focuses in the fieldof computer vision and multimedia analysis. Since most existing algorithms for videoconversion are required to recover the camera parameters, we design a depth estimation method formonocular video using non-parametric fusion with multiple cues. This non-parametriclearning-based technique can estimate comparatively accurate depth sequences with image contourcue, geometric perspective cue and temporal continuity between inter frames of monocular video.The main work and innovations of this dissertation are listed as follows:1. Most existing depth map estimation methods are prone to many errors by means of single depthcue. We propose a new depth estimation technique for monocular image by fusing foreground layerand background one. Foreground depth layer focuses on the depth information in salient regions, isbuilt on a key hypothesis that the visual scenes with similar semantics or photometric contents arelikely to have analogous depths; while background depth one using geometric cues can reflect theoverall trends of depth distribution.2. To solve the problem that initial foreground depth map estimated from non-parametric learningusually have the indistinct boundaries and relatively cluttered scene structures, imagesegment-guided depth calibration technique is presented to adjust the initial coarse depth.Graph-based segmentation is exploited to split different objects in the given scene, and then thedepth values are averaged in each predefined segment to embed the edges and its positioninformation of the objects to the corresponding depth map for the accuracy improvement.3. Different from the traditional depth map estimation methods using geometric cues, thebackground depth map can be estimated by means of linear perspective. Automatic Grouping ofSemantics (AGS) algorithm is introduced to estimate the vanishing points of single image, and thenthe linear perspective principle is adopted to assign background depth map with vanishing cues. Thetypes of linear perspectives of the given scenes are classified into the following five categories:up-bottom one, left-right one, right-left one, upper left corner-lower right corner one and upper rightcorner-lower left corner perspective. The obtained background depth represents a holistic depthdistribution in the given visual scene on the whole.4. Generally speaking, simple depth map estimations of video sequences can stitch directly the depth map of each frame into depth sequence, this paper proposes the joint spatio and temporalinformation of inter-frames in monocular sequence together to estimate the corresponding depthvideo sequence. Temporal coherence term and motion constraint term are introduced to thenon-parametric model for single image to extract depth video sequence. This presented method caneffectively improve the interframe continuity and reduce the depth deviation of moving objects inthe given monocular video sequence.To monocular2D video without any detailed camera parameters, the proposed approach is ableto obtain a depth sequence with obvious structures, distinct object boundaries, relatively accuratescene positions and better temporal continuity between inter frames to implement an effective2D to3D video conversion.
Keywords/Search Tags:depth map, non-parametric, multiple cues, meachine learning, linear perspective, spatio-temporal
PDF Full Text Request
Related items