Font Size: a A A

Research On Data Placement Strategy For Scientific Workflows In Cloud

Posted on:2012-11-11Degree:MasterType:Thesis
Country:ChinaCandidate:P ZhengFull Text:PDF
GTID:2218330338463392Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of information technology, data-intensive applications have been widely used in many fields, especially in scientific computing. These applications generally have a large number of application data which are usually relative with each other. Complicated tasks can be executed automatically by using workflow technology and we call these applications scientific workflows. Scientific workflows generally require not only high performance computing resources but also tremendous storage. As a result, they are mostly deployed in distributed computing systems which can fit their huge demand of computing and storage resources.As a typical distributed computing technology, cloud computing can provide the resources scientific workflow applications need with a relative low cost. Deploying and executing scientific workflows in cloud can make cost savings as while as provide good opportunities for collaboration among researchers all over the Internet.However, scientific workflows are facing some new challenges when taking advantage of cloud computing technology, especially in data placement. Data movements between distributed data centers in cloud environment are almost inevitable in scientific workflows'execution and these movements bring challenges which are mainly as follows.1) Some application data are of their fixed locations and not allowed to move.2) In cloud environment which usually means a relative limited network bandwidth, movements of large amounts of application data bring huge time cost which should not be ignored.3) Data movements between data centers which belong to different providers always mean additional expense. In conclusion, data placement which has a serious influence on data movements between data centers in cloud is a critical problem.In this paper, we model the data placement problem above first which mainly includes the modeling of cloud environment, scientific workflow applications and the time cost caused by data movements between different data centers in cloud. Then we propose two different data placement strategies separately for two different kinds of applications. The global data placement strategy is mainly for the relative stable scientific workflows as while as the dynamic data placement strategy is mainly for the relative dynamic ones. Both of these two data placement strategies can reduce the data movements between data centers which means a lower cost and higher performance.We build a simulation environment of cloud in which we do a series of simulations to evaluate the performance of two data placement strategies mentioned above. Simulations show that both of our strategies have a good performance especially in reducing the time cost caused by data movements between data centers. Our research is not only of great importance for scientific workflow applications deployed and executed in cloud but also suitable for other data-intensive applications in cloud environment.
Keywords/Search Tags:Cloud Computing, Scientific Workflow, Data Placement, Data Dependency
PDF Full Text Request
Related items