Font Size: a A A

Research On Algorithms Of Scientific Workflow Intermediate Data Set Storage Problem In Cloud Environment Cheng Kun

Posted on:2020-10-06Degree:MasterType:Thesis
Country:ChinaCandidate:K ChengFull Text:PDF
GTID:2428330596478882Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As a tool that can automate service processes for a large number of scientific tasks,scientific workflow has played a major role in many scientific fields.Because scientific workflows require high-performance computing resources to perform tasks,traditional scientific workflows are deployed in clusters or grid systems that are expensive to build and maintain.As a new generation computing platform,cloud computing provides users with large-scale and relatively low-cost IT resource leasing possibilities.It can also enable scientists from different regions to cooperate more flexibly through the Internet,to meet the needs of scientific workflow operation.Running a scientific workflow in a cloud environment will generate a large number of useful intermediate datasets.However,all storage or regeneration of these huge datasets is costly.Therefore,the management of these intermediate datasets will have an important impact on the execution efficiency of scientific workflow and scientific research process.In order to improve the efficiency of scientific workflow,this paper mainly studies the storage of intermediate datasets in scientific workflows under cloud environment.The research work is as follows:(1)This paper summarizes the scientific workflow,cloud computing technology and related intermediate dataset storage algorithms in the cloud environment,and analyses the principles and advantages and disadvantages of various algorithms,so as to provide a theoretical basis for the follow-up work.Provide a theoretical basis for follow-up work.(2)In the case of single cloud,because the unstructured scientific workflow data set has a complex and messy relationship,there is no law to follow.Therefore,based on the cost model of CTT-SP algorithm,improve the related attributes of the datasets and increase the data integration rate calculation method,avoid adding the directed edges to the data dependency graph of the scientific workflow,and only need to pay attention to the cost rate of each datasets,the fitness value of the individual scientific workflow is calculated,and then the differential evolution algorithm of the dynamic variation model is used to solve the problem.The simulation results show that the differential evolution algorithm can find the storage and deletion combination with the lowest total cost better than other methods.(3)In the cloudy case,the time complexityof the GT-CSB algorithm for solving linear scientific workflow problems is improved.In the face of different cloud service providers corresponding to different resource occupancy payment models,the GT-CSB algorithm can find the balance between computational cost,storage cost and transmission cost in time complexityo(m~4n~3).However,the algorithm ignores the calculation of the weight between the same starting edge and leads to computational redundancy.By classifying the edges originating from the same starting edge,the time complexity of the algorithm is improved too(m~3n~3).
Keywords/Search Tags:Cloud Environment, Scientific Workflow, Data Storage, Differential algorithm
PDF Full Text Request
Related items