Font Size: a A A

Depth Inference and Visual Saliency Detection from 2D Images

Posted on:2014-08-10Degree:Ph.DType:Dissertation
University:University of Southern CaliforniaCandidate:Wang, JingweiFull Text:PDF
GTID:1458390005985816Subject:Engineering
Abstract/Summary:
With the rapid development of 3D vision technology, it is an active research topic to recover the depth information from 2D images. Current solutions heavily depend on the structure assumption of the 2D image and their applications are limited. It is now still technically challenging to develop an efficient yet general solution to generate the depth map from a single image. Furthermore, psychological study indicates that human eyes are particular sensitive to salient object region within one image. Thus, it is critical to detect salient object accurately, and segment its boundary very well as small depth error in these areas will lead to intolerant visual distortion. Briefly speaking, research works in this literature can be categorized into two different categories. Depth map inference system design and salient object detection and segmentation algorithm development.;For depth map inference system design, we propose a novel depth inference system for 2D images and videos. Specifically, we first adopt the in-focus region detection and salient map computation techniques to separate the foreground objects from the remaining background region. After that, a color-based grab-cut algorithm is used to remove the background from obtained foreground objects by modeling the background. As a result, the depth map of the background can be generated by a modified vanishing point detection method. Then, key frame depth maps can be propagated to the remaining frames. Finally, to meet the stringent requirements of VLSI chip implementation such as limited on-chip memory size and real-time processing, we modify some building modules with simplified versions of the in-focus region detection and the mean-shift algorithm. Experimental result shows that the proposed solution can provide accurate depth maps for 83% of images while other state-of-the-art methods can only achieve accuracy for 34% of these test images. This simplified solution targeting at the VLSI chip implementation has been validated for its high accuracy as well as high efficiency on several test video clips.;For salient object detection, inspired by success of late fusion in semantic analysis and multi-modal biometrics, we model saliency detection as late fusion at confidence score level. In fact, we proposed to fuse state-of-the-arts saliency models at score level in a para-boosting learning fashion. Firstly, saliency maps generated from these models are used as confidence scores. Then, these scores are fed into our para-boosting learner (i.e. Support Vector Machine (SVM), Adaptive Boosting (AdBoost), or Probability Density Estimator (PDE)) to predict the final saliency map. In order to explore strength of para-boosting learners, traditional transformation based fusion strategies such as Sum, Min, Max are also applied for comparison purpose. In our application scenario, salient object segmentation is our final goal. So, we further propose a novel salient object segmentation schema using Conditional Random Field (CRF) graph model. In this segmentation model, we first extract local low level features, such as output maps of several saliency models, gradient histogram and position of each image pixel. We then train a random forest classifier to fuse saliency maps into a single high level feature map using ground-truth annotations. Finally, Both low- and high-level features are fed into our CRF and parameters are learned. The segmentation results are evaluated from two different perspectives: region and contour accuracy. Extensive experimental comparison shows that both our salient object detection and segmentation model outperforms the ground truth labeled by human eyes. State-of-the-art saliency models and are, so far, the closest to human eyes' performance.
Keywords/Search Tags:Depth, Saliency, Detection, Salient object, Inference, Images
Related items