Research On Algorithms Of Scientific Workflow Intermediate Data Set Storage Problem In Cloud Environment Cheng Kun

Posted on:2020-10-06

Degree:Master

Type:Thesis

Country:China

Candidate:K Cheng

Full Text:PDF

GTID:2428330596478882

Subject:Software engineering

Abstract/Summary:

As a tool that can automate service processes for a large number of scientific tasks,scientific workflow has played a major role in many scientific fields.Because scientific workflows require high-performance computing resources to perform tasks,traditional scientific workflows are deployed in clusters or grid systems that are expensive to build and maintain.As a new generation computing platform,cloud computing provides users with large-scale and relatively low-cost IT resource leasing possibilities.It can also enable scientists from different regions to cooperate more flexibly through the Internet,to meet the needs of scientific workflow operation.Running a scientific workflow in a cloud environment will generate a large number of useful intermediate datasets.However,all storage or regeneration of these huge datasets is costly.Therefore,the management of these intermediate datasets will have an important impact on the execution efficiency of scientific workflow and scientific research process.In order to improve the efficiency of scientific workflow,this paper mainly studies the storage of intermediate datasets in scientific workflows under cloud environment.The research work is as follows:(1)This paper summarizes the scientific workflow,cloud computing technology and related intermediate dataset storage algorithms in the cloud environment,and analyses the principles and advantages and disadvantages of various algorithms,so as to provide a theoretical basis for the follow-up work.Provide a theoretical basis for follow-up work.(2)In the case of single cloud,because the unstructured scientific workflow data set has a complex and messy relationship,there is no law to follow.Therefore,based on the cost model of CTT-SP algorithm,improve the related attributes of the datasets and increase the data integration rate calculation method,avoid adding the directed edges to the data dependency graph of the scientific workflow,and only need to pay attention to the cost rate of each datasets,the fitness value of the individual scientific workflow is calculated,and then the differential evolution algorithm of the dynamic variation model is used to solve the problem.The simulation results show that the differential evolution algorithm can find the storage and deletion combination with the lowest total cost better than other methods.(3)In the cloudy case,the time complexityof the GT-CSB algorithm for solving linear scientific workflow problems is improved.In the face of different cloud service providers corresponding to different resource occupancy payment models,the GT-CSB algorithm can find the balance between computational cost,storage cost and transmission cost in time complexityo(m~4n~3).However,the algorithm ignores the calculation of the weight between the same starting edge and leads to computational redundancy.By classifying the edges originating from the same starting edge,the time complexity of the algorithm is improved too(m~3n~3).

Keywords/Search Tags:

Cloud Environment, Scientific Workflow, Data Storage, Differential algorithm

Related items

1	Research On The Intermediate Data Management For Scientific Workflow Systems In Cloud
2	Research On The Intermediate Data Management For Scientific Workflow Systems In Cloud
3	Research On Scientific Workflow Data Layout Strategy For Cloud Environment
4	Research On Multi-objective Scientific Workflow Scheduling Algorithm Under Cloud Environment
5	Efficient scientific workflow scheduling in cloud environment
6	Research On Multiple Requests Data Management For Scientific Workflow Systems In Cloud
7	Research On Key Techniques Of Scientific Workflows In IaaS Environment
8	Research On Optimal Scientific Workflow Scheduling Algorithm With Deadline Constraint In Cloud Environment
9	Makespan And Cost Optimization In Scientific Workflow Scheduling In Cloud Computing Environment
10	Scientific Workflow Data Placement Method Based On Task Assignment And Dataset Replicas In Cloud Environment