Learning contextual information for holistic scene understanding

Posted on:2013-06-06

Degree:Ph.D

Type:Thesis

University:Cornell University

Candidate:Li, Congcong

Full Text:PDF

GTID:2458390008473402

Subject:Engineering

Abstract/Summary:

One of the primary goals in computer vision is holistic scene understanding, which involves many sub-tasks, such as depth estimation, scene categorization, saliency detection, object detection, event categorization, etc. Each of these tasks explains some aspect of a particular scene and in order to fully understand a scene, we would need to solve for each of these sub-tasks. In our human's visual system, the sub-tasks are often coupled together. One task can leverage the output of another task as contextual information for its own decision, and can also feed useful information back to the other tasks. In this thesis, our goal is to design computational algorithms that perform multiple scene understanding tasks in a collaborative way like human does.;In our algorithm design, we consider a two-layer cascade of classifiers, which are repeated instantiations of the original tasks, with the output of the first layer fed into the second layer as input. To better optimize the second-layer outputs, we propose three algorithms, which result in capturing contextual information at multiple levels, ranging from contextual interactions between different tasks to contextual interactions between objects and regions. First, to better leverage the first-layer contextual outputs, we propose Feedback Enabled Cascaded Classification Models (FE-CCM), which jointly optimizes all the subtasks. The training of the two-layer cascade involves a feedback step that allows later classifiers to provide earlier classifiers information about which error modes to focus on, thus results in better combining the contextual information between the various image attributes. Secondly, we also consider sharing contextual information between related tasks. We propose the theta-MRF algorithm, which captures the spatial and semantic relationship between classifiers through an undirected graph built on the parameters. The algorithm encourages the spatially or semantically related classifiers to share their parameters over the contextual features. Third, we discover new contextual attributes from images given only object annotations, to capture the contextual information between objects and regions. We propose two types of visual structured patterns: contextual-meta-object (CMO) and group-of-object (GRP). The CMOs capture multi-scale contextual interactions between objects and the unlabeled regions, and the GRPs capture arbitrary-order interactions between objects that demonstrate consistent spatial, scale, and viewpoint interactions with each other. These contextual patterns are then served as contextual attributes for enhanced object recognition and scene recognition tasks. Finally, we present superior results of the proposed algorithms on a variety of vision applications, including natural scene understanding, images aesthetics assessment, robotic assistive systems, and video applications.

Keywords/Search Tags:

Scene understanding, Contextual, Tasks, Interactions between objects

Related items

1	Scene and Video Understanding
2	Recognizing objects and reasoning about their interactions
3	Towards Scene Understanding: Object Detection, Segmentation, and Contextual Reasoning
4	Scene labeling with supervised contextual models
5	Research On Scene Understanding Technology Based On Deep Learning
6	Modeling And Inferring Belonging Relations In Scene
7	Research On Key Technologies Of Hierarchical Scene Understanding
8	Research Of Scene Understanding Algorithm Based On Monocular Vision
9	Understanding semantic relationships between data objects
10	Beyond the mask: A contextual model for educational programming with cultural objects in the art museum