Font Size: a A A

Design and optimization of scientific workflows

Posted on:2011-06-13Degree:Ph.DType:Dissertation
University:University of California, DavisCandidate:Zinn, DanielFull Text:PDF
GTID:1448390002964164Subject:Computer Science
Abstract/Summary:
This work considers the problem of design and optimization of scientific workflows. Progress in the natural sciences increasingly depends on effective and efficient means to manage and analyze large amounts of data. Scientific work ows form a crucial piece of cyberinfrastructure , which allows scientists to combine existing components (e.g., for data integration, analysis, and visualization) into larger software systems to conduct this new form of scientific discovery.;We propose VDAL (Virtual Data Assembly Lines), a data flow-oriented paradigm for scientific work flows. In the VDAL approach, data is organized into nested collections, much like XML, and ows between components during work flow execution. Components are configured with XQuery/XPath-like expressions to specify their interaction with the data. We show how this approach addresses many challenges that are common in scientific work flow design, thus leading to better overall designs. We then study different ways to optimize VDAL execution. First, we show how to leverage parallel computing infrastructure by exploiting pipeline, task, and data parallelism exhibited by the VDAL paradigm itself. To this end, we compile VDAL work ows into several Map-Reduce tasks, executed in parallel. We then show how the cost of data-shipping can be reduced in a distributed streaming implementation. Next, we propose a formal model for VDAL, and show how static analysis can provide additional design features to support the scientist during work ow creation and maintenance, namely, by displaying actor dependencies previewing the structure of the results, and explaining how output data will be generated from input data. Consequently, certain design errors can be detected prior to the actual work ow execution. Finally, we investigate the fundamental question of how to decide equivalence of VDAL work flows. We show that testing the equivalence of string-polynomials, a new problem, reduces to work-flow equivalence when an ordered data model is used. Here, our preliminary work defines several normal forms for approximating the equivalence of string polynomials.
Keywords/Search Tags:Work, Data, Ows, VDAL, Equivalence
Related items