Font Size: a A A

Research On Data Placement Strategy For Scientific Workflow In Cloud

Posted on:2016-12-12Degree:MasterType:Thesis
Country:ChinaCandidate:R P WangFull Text:PDF
GTID:2308330470477002Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As a new application paradigm, scientific workflows can improve the automation of scientific process with the abilities of integration, construction and coordination of heterogeneous distributed data, services and tools. Scientific workflow execution often needs huge computing resources and storage resources because they are usually data- and computation-intensive. Therefore, traditional computing environments are difficult to support the execution of scientific workflows. Cloud computing environments can provide scientific workflows with high-performance computing resources and massive storage resources. Executing scientific workflows in cloud may reduce the cost significantly and provide good opportunities for scientists to share resources and work collaboratively as well. However, there are some challenges for large and complex cloud scientific workflows in data placements to process and transport massive data. Because different data placements will directly affect the efficiency of scientific workflows, to solve these problems, this thesis explores a kind of data placement strategy based on data dependence and time cost, as well as an cost-effective data placement method specific to incremental data.Firstly, focusing on the dependence of scientific workflow task data, the processing capacity and bandwidth differences of data centers, a data sets placement strategy for scientific workflows that leads to rational data distribution among multiple data centers is proposed.Then, by automatically determining to store or delete the intermediate datasets in cloud data centers, a cost-effective data placement approach specific to incremental data that can enable scientific workflows to run in cloud at a lower cost is proposed.Moreover, data transfer cost optimization for the intermediate datasets is also discussed.Above all, simulation results show that the approaches proposed are feasible and effective in improving the execution efficiency of scientific workflows, and saving the workflow execution cost respectively.
Keywords/Search Tags:cloud computing, scientific workflow, data dependence, data placement, workflow execution cost
PDF Full Text Request
Related items