Font Size: a A A

Study On Scientific Workflow Management And Scheduling

Posted on:2012-11-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:C C LiuFull Text:PDF
GTID:1118330341951640Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Recently, the development of the e-Science research is, to some extent, determined by the data analysis of large scale scientific application programs and software tools on a huge number of data, as well as the computation abilities of the high performance resources for utility grids. As an effective management platform for combining complex processes and operating processes automatically, Scientific Workflow Management System (SWfMS) plays a more and more important role in relevant researches, and the development of new technologies of Scientific Workflow (SWF) has gained more and more attention. Recently, many large e-Science centers have developed dozens of SWfMSs in their own specific research domains, however, there is no any general standard among these systems and the co-operation of several systems is difficult. Therefore, it is necessary to generate a new system for general use by modifying an existing system for a new domain, or even developing a new system. Moreover, the"pay-per-use"model is becoming popular as more and more resources are added to the grids, several aspects should be taken into consideration when scheduling the workflow on these utility grids, such as workflow execution time, workflow execution cost, system reliability and so on. These objectives are contracted and restricted with each other. Therefore, how to optimize the operation efficiency among these aspects has become a hot topic. Based on the discussion of the current studies and drawbacks of SWF management and scheduling, this thesis is focused on the studies of SWfMS and the workflow scheduling on the utility grids. The main contribution of this thesis can be summarized as follows:(1) A review of the relative technologies and current studies of SWFWe review the current studies of SWF, including the lifecycles, models, presentations, languages, scheduling, and so on. We compare these technologies and analyze the recent studies, which provide the basis for the studies in this thesis.(2) Design of the SWfMS for general use based on BPELIn order to solve the problems in co-operation among several SWfMSs, we exploit the general using of Business Process Execution Language (BPEL), and design a new SWfMS referred as Ensemble Prediction Scientific Workflow system (EPSWFlow) based on BPEL with application in ensemble prediction. Based on the merits of BPEL such as plenty of control structure, full support for the web services, etc, EPSWFlow accomplishes combining and scheduling the services exisiting in the ensemble process dynamically. Moreover, EPSWFlow exploits JSDL (Job Submission Description Language) to describe a large number of legacy applications which cannot be enveloped to web sevices, schedules and monitors these applications by using the standard job submission system GridSAM, which solves the problems of intergrating legacy applications.(3) Research on adaptability of SWfMS on general purposeTo address the dynamic adaptability of SWF, we performed studies on SWF models of the architecture and implementation of EPSWFlow system, and propose a four-level abstracted model to provide an abstracted description of the Web services and legacy applications at each abstracted level. The SWF engine can select a service dynamically during the executing stage and make a real-time binding for the resource. At the same time, we study the system reliability, and provide three types of fault-tolerant strategies, which improve the abilities of solving abnormal situation, and improve the system reliability further.(4) Research on the cost optimization problems in workflow scheduling with deadline constraint on utility gridsThe workflow scheduling problem in different environments and conditions is one of the most important topics in SWF management because the scheduling result can make a great effect on the system performance. In order to solve the time and cost trade-off problem in workflow scheduling with deadline constraint, we present three novel algorithms in this thesis: Temporal Consistency based Deadline Bottom Level algorithm(TCDBL), Path Balance based Cost Optimization algorithm (PBCO) and BFTCSTM(Best Fit based on Time-dependent Coupling Strength and Temporal Mobility) rule-based iterative algorhtm. All these algorithms can decrease the workflow costs comparing with the previous algorithms.(5) Research on the time-cost trade-off problem with priority factors in workflow scheduling under dynamic environmentAs it is difficult to make an exact prediction of workflow execution time and workflow execution cost in the dynamic grid environments ahead of schedule, we study the time and cost trade-off problems based on the priority factor in workflow scheduling. We propose three real-time heuristics based on the bottom level strategy: Bottom Level based Sufferage (BLSuff), Bottom Level based Min-Min (BLMin) and Bottom Level based Min-Max (BLMax). These heuristics divide the tasks into several groups based on the workflow synchronization properties, and design a metric to optimize the workflow execution time and cost simultaneously using the trade-off factor, which obtain a better scheduling result.To sum up, we have studied on several key problems in scientific workflow management and scheduling , and propose some effective solutions. The studies in this thesis are helpful for the further study on the composition and management for the complex scientific computations and therefore accelerate the pace of scientific progress in both theory and practice.
Keywords/Search Tags:e-Science, Scientific Workflow (SWF), Ensemble Prediction, Dynamical Adaptability, Workflow Scheduling, Time-cost trade-off, Trade-off Factor
PDF Full Text Request
Related items