Font Size: a A A

Data Placement Strategy Towards Efficient Execution Of Scientific Workflows In Cloud Computing Platform

Posted on:2012-03-06Degree:MasterType:Thesis
Country:ChinaCandidate:S W LiuFull Text:PDF
GTID:2218330362460229Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the areas of weather forecasting, manned space flight, gene biological computing, high-energy physics data analysis, life science computing, earthquake prediction and other complex scientific researches, problem-solving needs cooperation of scientists in different areas, different disciplines and even different regions. Scientific workflow systems begin to receive attention and are used to automate the tasks'executing sequence. Traditionally, scientific workflow systems need to be deployed on supercomputers, distributed clusters, grid systems or other complex and expensive distributed computer systems. With the increasing of problem complexity, a large complex scientific workflow often contains thousands of scientific computing tasks that involve not only their own large-scale data processing, but also vast amounts of data transmission. Therefore, how to ensure scientific workflows'efficient execution in scientific computing areas and distributed computing environments has been the hot and difficult problems.Recently, with the continuous development of computer technology in distributed computing, parallel computing, grid computing and other computing models, the computer industry and academia put forward the cloud computing. Cloud computing is a method of sharing infrastructure, which will turn the computing resources and storage resources in different geographic locations into a virtual pool of resources. Users will apply for cloud resources if need be, release them when finish the tasks, making the resources can be reused. In this way, cloud computing centers have the ability to provide high performance computing resources and mass storage resources with low cost.Although cloud computing with the characters of efficiency, flexibility and customization gives a new way to solving the difficulties in running scientific workflow systems, the vast amounts of data transmission across different cloud data centers can be a big challenge. For this problem, this paper studies efficient data placement strategies which can reduce the data traffic between different data centers and optimize the scientific workflow. The main work is as follows:(1) Improving a cluster data placement strategy. In this paper, we analyze a matrix based k-means cluster strategy in detail, which can effectively reduce the number of datasets movement during the workflow's execution. However, the strategy does not take the size of datasets into consideration. If the moving datasets are of great volume, the data transfer between data centers is huge. Considering the size of datasets, this dissertation optimizes and improves cluster strategy from the aspects of data-dependence, task-scheduling and the placement of generated datasets. Simulations show that the improved cluster strategy can effectively reduce the cost of data traffic among data centers.(2) On the basis of the work above, Proposing an innovative two-stage data placement strategy and a task scheduling strategy for efficient workflow execution. With this strategy, the most related datasets can be placed into the same data center as much as possible in terms of their data dependencies at workflow build-time; then the task scheduling strategy can schedule the tasks to their most closely related data centers for execution and put newly-generated data sets into the data center that has the most dependency with them at workflow runtime. The experimental results show that the proposed strategy can significantly reduce the volume of data transfer among different data centers, and hence improve the performance of running scientific workflows and cut down the cost of doing science on the clouds as well.
Keywords/Search Tags:Cloud Computing, scientific workflow, data dependence, data placement, nimbus
PDF Full Text Request
Related items