Font Size: a A A

Research Of Improved Algorithmic Approaches For Intermediate Datasets Storage Problem In The Cloud Environment

Posted on:2019-03-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y WangFull Text:PDF
GTID:2428330569496087Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As scientific research processes become more and more complex and scientific computing steps become more and more numerous,running scientific workflows on cloud computing platforms is more and more popular.However,along with the intermediate datasets storage problem brought by scientific workflow executing on cloud computing platform,the nature of this problem is to minimize the cost.In the execution of scientific workflow on cloud computing platform,there are large volumes of intermediate datasets generated that are critical to scientists.Therefore,how to deal with these intermediate datasets has become particularly important.Cloud computing platform provides storage resources,computing resources and bandwidth resources for dealing with these intermediate datasets.However,while enjoying these cloud services,users have to pay storage cost,computation cost and transfer cost.So,it is important how to allow users to pay for the minimum cost while enjoying cloud services efficiently.At present,the research on the intermediate datasets storage problem in cloud environment focuses on the following three aspects:firstly,the cost model optimization of intermediate dataset storage problem in cloud environment;secondly,the storage algorithm research for the linear workflow and non-linear workflow in single cloud environment;thirdly,the storage algorithm research for the linear workflow and non-linear workflow in multiple clouds environment.This paper mainly elaborates that the optimization of cost minimization algorithm time efficiency for linear workflow's intermediate datasets storage problem and cost model optimization for non-linear workflow's intermediate datasets storage problem,the specific as follows:(1)We set forth theoretical basis of the optimization algorithm for intermediate datasets storage problem in cloud environment,including the definition of the problem,the cost model and the corresponding intermediate dataset storage algorithms,and point out the existing problems in algorithm and research focuses.(2)We utilize the dynamic programming to optimize the storage algorithm for linear workflow's intermediate dataset storage problem in multiple clouds,with the time complexity improved from O(m~4n~3)to O(m~3n~3).(3)Aiming at the problem that the cost model is not fully expressed for the intermediate datasets storage problem in multiple clouds,by studying a lot of reference materials,we combine the dataset usage frequency,access delay tolerance and transfer cost into the new cost model effectively,which defines the dataset usage frequency as that of datasets in peak time that better reflects users'actual needs.(4)Under the new cost model,we utilize greedy strategy to design a new algorithm.And we analyze the simulation results which show that the greedy strategy is feasible for the new cost model.At the end of the paper,the author summarizes the main work of this paper,and looks forward to the research focus for intermediate dataset storage problem in the future.
Keywords/Search Tags:Intermediate Datasets, Storage Strategies, Minimize Costs, Cloud Computing
PDF Full Text Request
Related items