Font Size: a A A

Research On Multi-objective Optimization-Based Data Placement Strategy For Scientific Workflows In Cloud Computing Environment

Posted on:2017-03-21Degree:MasterType:Thesis
Country:ChinaCandidate:H M ChengFull Text:PDF
GTID:2308330485464020Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In modern scientific application research fields, like astronomy, high-energy physics and bio-informatics are usually often consist of hundreds or thousands of tasks and consume huge amount of input datasets, so they need high performance computing and massive storage space. Scientific workflows are becoming a popular and important mechanism to help scientists automate the processes of scientific simulation and data analysis. With the coming of complex scientific research, ensuring efficient execution of scientific workflows is very important. Cloud computing has globally distributed data centers, so it can provide huge storage resources and high performance computing to the general public, and its high flexibility and high efficiency can execute the workflows in a new way. However, it faces a lot of challenges when we place the datasets of workflows in cloud computing, especially the data placement. Huge amount of datasets is moved from one datacenter to another datacenter when executing the workflows. So, how to place datasets of workflows effectively and lower the transmission of datasets between datacenters is very important.In cloud computing environment, each of data centers has its own characteristics, it is unrealistic that all data datasets are placed in one data center. So, we need an efficient data placement strategy to improve efficiency of execution of scientific workflow. At present, most data placement strategies are based on clustering algorithm and evolutionary algorithm, like k-means algorithm, genetic algorithm and Particle Swarm Optimization-based algorithm. They can place the datasets with high dependency in one data center, so that they can reduce data transferring time during execution of scientific workflows. But they ignore load balancing between data centers, which may lead to datasets placing in few data centers, thereby lowering execution efficiency of scientific workflows. So, a good data placement strategy should not only consider data transferring time, but also consider load balancing between data centers.Based on the above consideration, it is very difficult to take into account data movements and load balancing at the same time when placing datasets. And so it is hard to get an effective solution to this problem by the traditional methods. This paper uses a method based on multi-objective optimization to place datasets. Solving problems with many objectives usually use heuristic methods based on evolutionary algorithm, which has characteristics of self-adaption, avoiding locally optimal and black box solution, so it can not only reducing data transferring time, but also balance load of data centers, thereby effectively solving the problem of data placement.This paper focuses on data placement strategy, increasing execution efficiency of scientific workflows by optimizing data transferring time and load balancing between datacenters. Firstly, model the data placement of scientific workflows in cloud computing, exemplify the traditional data placement strategy, point its shortage and give the right data placement scheme. Secondly it use the idea of multi-objective optimization to optimize two objectives (data transferring time and load balancing between datacenters). Thirdly, a Knee Point Driven Evolutionary Algorithm (KnEA) is used to place datasets of scientific workflows, getting a data placement scheme performing well in data transferring time and load balancing. Finally, using a MOEA/D and external archive searching combination algorithm (EAS-MOEA/D).This algorithm uses external archive as the search direction based on MOEA/D algorithm, and uses decomposition based strategy to evolve its working population and uses non-domination based sorting for maintaining the external archive. This paper uses EAS-MOEA/D to place datasets. This paper simulates cloud computing and scientific workflows by matlab. This paper run two data placement strategies proposed in this paper with other traditional data placement strategies and evaluate the performance by comparing data transferring time and load balancing. The experiment results show that by using our data placement strategies, data transferring time and load balancing between data centers is significantly reduced compared to other similar data placement strategies. The outcome of this paper can increase execution efficiency of scientific workflows effectively, reduce operation cost for cloud service provider and has great development foreground.
Keywords/Search Tags:Cloud Computing, Scientific Workflows, Data Placement, Multi-objective Optimization, Load Balancing
PDF Full Text Request
Related items