Font Size: a A A

Learning contextual information for holistic scene understanding

Posted on:2013-06-06Degree:Ph.DType:Thesis
University:Cornell UniversityCandidate:Li, CongcongFull Text:PDF
GTID:2458390008473402Subject:Engineering
Abstract/Summary:
One of the primary goals in computer vision is holistic scene understanding, which involves many sub-tasks, such as depth estimation, scene categorization, saliency detection, object detection, event categorization, etc. Each of these tasks explains some aspect of a particular scene and in order to fully understand a scene, we would need to solve for each of these sub-tasks. In our human's visual system, the sub-tasks are often coupled together. One task can leverage the output of another task as contextual information for its own decision, and can also feed useful information back to the other tasks. In this thesis, our goal is to design computational algorithms that perform multiple scene understanding tasks in a collaborative way like human does.;In our algorithm design, we consider a two-layer cascade of classifiers, which are repeated instantiations of the original tasks, with the output of the first layer fed into the second layer as input. To better optimize the second-layer outputs, we propose three algorithms, which result in capturing contextual information at multiple levels, ranging from contextual interactions between different tasks to contextual interactions between objects and regions. First, to better leverage the first-layer contextual outputs, we propose Feedback Enabled Cascaded Classification Models (FE-CCM), which jointly optimizes all the subtasks. The training of the two-layer cascade involves a feedback step that allows later classifiers to provide earlier classifiers information about which error modes to focus on, thus results in better combining the contextual information between the various image attributes. Secondly, we also consider sharing contextual information between related tasks. We propose the theta-MRF algorithm, which captures the spatial and semantic relationship between classifiers through an undirected graph built on the parameters. The algorithm encourages the spatially or semantically related classifiers to share their parameters over the contextual features. Third, we discover new contextual attributes from images given only object annotations, to capture the contextual information between objects and regions. We propose two types of visual structured patterns: contextual-meta-object (CMO) and group-of-object (GRP). The CMOs capture multi-scale contextual interactions between objects and the unlabeled regions, and the GRPs capture arbitrary-order interactions between objects that demonstrate consistent spatial, scale, and viewpoint interactions with each other. These contextual patterns are then served as contextual attributes for enhanced object recognition and scene recognition tasks. Finally, we present superior results of the proposed algorithms on a variety of vision applications, including natural scene understanding, images aesthetics assessment, robotic assistive systems, and video applications.
Keywords/Search Tags:Scene understanding, Contextual, Tasks, Interactions between objects
Related items