Research On Scientific Workflow Data Layout Strategy For Cloud Environment

Posted on:2019-11-25

Degree:Master

Type:Thesis

Country:China

Candidate:S W Gao

Full Text:PDF

GTID:2428330593450581

Subject:Computer technology

Abstract/Summary:

The scientific workflow under cloud computing has been widely used in science and engineering.Scientific workflows can easily and effectively calculate and process large-scale data,and are therefore mainly used in data-intensive processing.The development of cloud computing technology enables the cloud platform to obtain corresponding computing resources and services as required,and can dynamically expand.While scientific workflows deal with intensive data,they can use the flexible,secure,and convenient computing resources provided by the cloud platform.However,as more and more intensive data are processed in scientific workflows and the processing process becomes more and more complex,its execution efficiency has been greatly affected.Scientific workflows need to consider data transmission problems between data centers when processing large amounts of intensive data.A reasonable data placement solution has a great impact on the efficiency of the entire scientific workflow.Therefore,how to place reasonable data has become a research hotspot.In view of the problems existing in the data placement of scientific workflow,this paper studies the scientific workflow's data placement strategy,and carries out the following work.1)A pre-positioning strategy for scientific workflow data is proposed.For the initial stage of the scientific workflow,the problem of low efficiency of execution due to unreasonable data placement and inconsistent data sets.This paper analyzes the relationship between data and data,data and tasks,sorts out the flow of scientific workflow,and establishes a multidimensional vector model of data tasks through the relationship between data and tasks.This paper studies the method of data placement in scientific workflow,and based on this,it introduces the dependencies among data sets,and proposes a pre-placement strategy for scientific workflow data based on the HCK-Means strategy.The strategy first uses the multidimensional vector of data tasks to hierarchical cluster the most original data sets,and then clusters the results as the initial state of K-Means clustering to improve the data.The effect of clustering eventually results in a pre-placement scheme.Experiments have verified that this strategy can effectively reduce the number of data movements between data centers and reduce transmission efficiency.2)Put forward a data adjustment strategy for the implementation phase of the scientific workflow.Aiming at the problem of slow execution caused by the frequent flow of data in the data center during the execution phase,a data adjustment strategy for the execution phase was proposed.The strategy includes two parts: the method of placing the intermediate dataset generated during the execution stage and the method of dynamic data adjustment after the partial data selection is rebuilt.The intermediate data set placement method first constructs the intermediate data generated by the scientific workflow during the execution phase and the dependency model of each data center.Then,the intermediate data is placed on the data center with the largest dependency dependence according to the built dependency degree model.The dynamic adjustment method of partial data selection and reconstruction refers to the adjustment of the data of the load capacity exceeding data center when the data center exceeds the limit immediately after the data center is placed globally.This paper first constructs a data link list for the data set on the overrun data center,selects some active data from it,then uses the genetic algorithm to reconstruct the selected data,and uses the strong spatial search capability of the genetic algorithm to optimize the placement of the data set.Improve overall execution efficiency.For the two placement strategies proposed in this paper,scientific experiments were performed using the simulation platform.The experimental verification of the implementation of the data adjustment strategy proposed in this paper,to reduce the number of data transmission and movement,improve the efficiency of the use of data.

Keywords/Search Tags:

cloud computing, scientific workflow, clustering algorithms, data placement

Related items

1	Data Placement Strategy Research For Scientific Workflow In Hybrid Cloud Computing
2	Research On Data Placement Strategy For Scientific Workflow In Cloud
3	Data Placement Strategy Towards Efficient Execution Of Scientific Workflows In Cloud Computing Platform
4	Research On Data Placement Strategy For Scientific Workflows In Cloud
5	Research On Data Placement Strategy For Data-Sharing Scientific Cloud Workflows
6	Research Of Replication And Placement Strategies For The Intermediate Data Of Scientific Workflow In Cloud
7	Research On Intermediate Datasets Placement Strategy For Scientific Workflow In Hybrid Cloud
8	Research On Scientific Workflow Scheduling Algorithm Based On Cloud Computing
9	Scientific Workflow Data Placement Method Based On Task Assignment And Dataset Replicas In Cloud Environment
10	Makespan And Cost Optimization In Scientific Workflow Scheduling In Cloud Computing Environment