Managing scientific workflow provenance

Posted on:2011-05-01

Degree:Ph.D

Type:Dissertation

University:University of California, Davis

Candidate:Anand, Manish Kumar

Full Text:PDF

GTID:1448390002452235

Subject:Computer Science

Abstract/Summary:

An advantage of scientific workflow systems over traditional approaches is their ability to automatically record the provenance (or lineage) of intermediate and final data products generated during workflow execution. The provenance of a data product contains information about how the product was derived, and is crucial for enabling scientists to easily understand, reproduce, and verify scientific results. This work addresses challenges for managing large amounts of provenance information, and describes efficient approaches to model, store, query, visualize, and explore provenance information.;Specifically, a model of provenance is presented that extends the conventional provenance model, supports nested data, and captures fine-grained lineage information. Novel reduction techniques are described to optimize storage size, update time, and query-response time. A high-level query language is proposed to allow nonexperts to easily express provenance graph queries over the model. Query optimization techniques that leverage the storage reductions are described. These optimizations scale with the size of provenance and query complexity, and can also be used in more general settings to efficiently answer a broad range of path queries over labeled, acyclic directed graphs. To further allow users to explore relevant provenance information, a navigation model for provenance is proposed that provides an integrated approach for creating provenance views, navigating between views, and summarizing views. To demonstrate the approaches presented, a Provenance Browser application has been developed and integrated into the Kepler scientific workflow system.

Keywords/Search Tags:

Provenance, Scientific workflow, Approaches

Related items

1	Research On Scientific Workflow Reuse
2	Research On Workflow Model For Multi-domain Scientific Data Management And Its Provenance Mechanism
3	Representing meaningful provenance in scientific workflow systems
4	Querying and managing OPM-compliant scientific workflow provenance
5	Design And Implementation Of A Provenance Framework In Workflow System-Nebulas
6	Research On Workflow Matching And Discovery Based On Data Unification For Proteomics
7	Enabling Reproducibility of Scientific Data Flows Through Tracking and Representation of Provenance
8	Research On Privacy-preserving Provenance Workflow Publishing
9	Provenance of exploratory tasks in scientific visualization: Management and applications
10	The Research And Implementation Of Scientific Workflow System Based On GOS