Font Size: a A A

Research On Scientific Workflow Reuse

Posted on:2015-09-20Degree:MasterType:Thesis
Country:ChinaCandidate:J Q ChenFull Text:PDF
GTID:2298330467457167Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Bioinformatics experiments are usually performed by using scientific workflows in which tasks are chained together forming very intricate and nested graph structures. Scientific workflow systems have then been developed to guide users in the design and execution of workflows. An advantage of these systems over traditional approaches is their ability to automatically record the provenance (or lineage) of intermediate and final data products generated during workflow execution. The provenance of a data product contains information about how the product was derived, and it is crucial for enabling scientists to easily understand, reproduce, and verify scientific results. For several reasons, the complexity of workflow and workflow execution structures is increasing over time, which has a clear impact on scientific workflows reuse.The global aim of this thesis is to enhance workflow reuse. Firstly, this requires us to provide a feasible strategy able to reduce the complexity of workflow structures while preserving provenance to ensure that the result is right. Then based on the optimized structure of workflow we explore a reasonable method of scientific workflow structure query. This thesis is launched by these problems and puts forward some strategies to solve the problems above.First, we propose an approach to rewrite the graph structure of any scientific workflow into a simpler structure, namely, a series-parallel (SP) structure while preserving provenance. SP-graphs are simple and layered, making the main phases of workflow easier to distinguish. Additionally, from a more formal point of view, polynomial time algorithms for performing complex graph-based operations (e.g., comparing workflows, which is directly related to the problem of subgraph homomorphism) can be designed. However, as is known, many operations related to DAG without any restriction on their structures have an NP-hard problem. This is a main reason for most of the scientific workflow systems can not provide the retrieval based on scientific workflow structure. The SPFlow rewriting able to transfer any workflow into sp structure and provenance-preserving algorithm and its associated tool are thus introduced.Second, with the full use of the research achievements of the first work, we can get the sp structures of all scientific workflows. Therefore, the scientific workflow query based on the graph structure provides a possible. In general, the comparison of a tree is simpler that a graph. A major contribution of our second work is transferring the comparison of scientific workflow based on a graph to get the common subtrees of two trees generated by sp transfer. This method is a breakthrough in the workflow query. As the workflow query is crucial for scientists to get existing research experience and results, this promotes an important basis for scientific workflow reuse and highlights the significance of the scientific workflow structure comparison strategy proposed by us. For the comparison of workflow structure, we have developed a practical tool.The two main approaches of this thesis (namely, SPFlow and DFFlow) are based on a provenance model that we have introduced to represent the provenance structure of the workflow executions. The notion of provenance-equivalence which determines whether two workflows have the same meaning is also at the center of our work. Our solutions have been systematically tested on large collections of real workflows, especially from the Taverna system.
Keywords/Search Tags:scientific workflows, provenance, provenance-equivalent, series-parallel graphs, query workflow structures
PDF Full Text Request
Related items