Font Size: a A A

Research On Intermediate Datasets Placement Strategy For Scientific Workflow In Hybrid Cloud

Posted on:2017-10-08Degree:MasterType:Thesis
Country:ChinaCandidate:H M ZhangFull Text:PDF
GTID:2428330566953025Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Deploying data-intensive applications such as scientific workflow in hybrid cloud,can extend the limited capacity on premise by taking full use of the infinite scalability and wide-area distribution of public cloud resources,to improve efficiency of application execution and to speed up the scientific research.Large volumes of intermediate datasets generated in the execution of scientific workflow need to be frequently used by successor tasks and scientific computing holds the repeatability and result reusability characteristics,so it is necessary to store and place these datasets.However,on the one hand,the distribution of tasks inevitably results the long time transmission among datacenters,which may reduce the efficiency of execution;on the other hand,the pay-as-you-go feature and the security issues may lead to high monetary cost and insecurity.Therefore,how to properly arrange these datasets in hybrid cloud,so as to achieve a low-cost,high-efficiency and high-security executing environment,is a research problem of utmost importance.To address above issues,this thesis proposes three data placement strategies from several optimization aspects in two different hybrid cloud architectures.The main contents are summarized as follows:1.This thesis proposes a cost-aware intermediate datasets placement strategy,CRODP,for reducing the resource lease cost in public cloud.In CRODP,we first analyze and build the cost model to measure the storage and transmission expenditure globally,and the time model is about transmission from data's resident datacenter to successor tasks'.Then,we transform the data placement problem,which proved to be a NP-hard problem,to be a combinational optimization problem for the target of saving funds and time,and adopt the chemical-reaction-inspired metaheuristic method in solving the problem.Simulation results confirmed that CRODP can significantly reduce the cost compared with the clustering placement strategy.2.This thesis designs multi-cloud architecture intermediate datasets placement strategy,DPSDP,for the need of a wider area application deployment and for avoiding the matter of vendor lock-in.Considering the path selection problem in data transmitting between two different clouds in two areas,we redesign the time model which is slightly different with the above strategy.Then,we further introduce service providers' cost-effective difference in cost model.Finally,we implement the placement strategy based on discrete particle swarm optimization algorithm,which can decrease the cost and time.3.This thesis designs a security-constrained intermediate datasets placement strategy,GASDPS,which takes the data security requirements into consideration,and proposes a global security satisfaction degree(SD)model.By analyzing the clouds' security service providing way and data secure needs,a model for evaluating security satisfaction degree will be constructed,according to the security benefit ratio and the security cost,under the condition of security constraint satisfaction.Then,we develop a Genetic-Algorithm-based placement strategy,judging by fitness function constructed with SD and transmission time.Simulation results proved the high degree of security satisfaction of the proposed strategy.In conclusion,the proposed three optimization strategies for different objects have been simulated on WorkflowSim platform,the results verify the high efficiency in time,cost and security.Meanwhile,the thesis also point out the limitations to be further improved.
Keywords/Search Tags:Hybrid cloud, scientific workflow, data placement, cost reduction, security satisfaction
PDF Full Text Request
Related items