Font Size: a A A

Managing scientific workflow provenance

Posted on:2011-05-01Degree:Ph.DType:Dissertation
University:University of California, DavisCandidate:Anand, Manish KumarFull Text:PDF
GTID:1448390002452235Subject:Computer Science
Abstract/Summary:
An advantage of scientific workflow systems over traditional approaches is their ability to automatically record the provenance (or lineage) of intermediate and final data products generated during workflow execution. The provenance of a data product contains information about how the product was derived, and is crucial for enabling scientists to easily understand, reproduce, and verify scientific results. This work addresses challenges for managing large amounts of provenance information, and describes efficient approaches to model, store, query, visualize, and explore provenance information.;Specifically, a model of provenance is presented that extends the conventional provenance model, supports nested data, and captures fine-grained lineage information. Novel reduction techniques are described to optimize storage size, update time, and query-response time. A high-level query language is proposed to allow nonexperts to easily express provenance graph queries over the model. Query optimization techniques that leverage the storage reductions are described. These optimizations scale with the size of provenance and query complexity, and can also be used in more general settings to efficiently answer a broad range of path queries over labeled, acyclic directed graphs. To further allow users to explore relevant provenance information, a navigation model for provenance is proposed that provides an integrated approach for creating provenance views, navigating between views, and summarizing views. To demonstrate the approaches presented, a Provenance Browser application has been developed and integrated into the Kepler scientific workflow system.
Keywords/Search Tags:Provenance, Scientific workflow, Approaches
Related items