Font Size: a A A

Research On Scheduling And Algorithm For Independent Tasks In Data Grid

Posted on:2011-04-25Degree:MasterType:Thesis
Country:ChinaCandidate:D LiuFull Text:PDF
GTID:2178360308469482Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The emergence of grid has integrated various resources which are distributed geographically. This makes the goal of sharing resources coming true and so the resources can cooperate with each other. On the other hand, with the remarkable improvement of the capacity and performance of sensor, storage system and processing ability through network, it is possible to produce some magnanimity files. Data-intensive applications not only require huge power of computing but also need to access and process enormous data sets. However, traditional computing resources and storage resources can't bear the weight of this requirement. That is why the data grid emerging.The source of data sets which are required by data-intensive applications can be distributed on different storage resources of the grid. While the huge data sets are transferred in a large space, considerable delay cannot be avoided. So the efficiency in transmitting data sets increasingly becomes the emphasis which is noticeable for us. Since it is possible that a data set has multiple replica in different storage resources, in order to improve the efficiency of data transmission, appropriate storage resources must be selected. In order to quicken the execution of data-intensive application and reduce the makespan of it, we should find an effective solution of task scheduling and also an effective way in selecting storage resources should be adopted.This paper is about the independent task scheduling problem in data grid. Data-intensive application is composed of a set of independent tasks which can be executed in an arbitrary sequence. These tasks not only require great power of computing but also need to process huge data sets. According to this characteristic, the problem in this paper can be broken into two sub-problems:first, task scheduling and the second is storage resource selecting. Since the objective of scheduling in this paper is minimizing the whole completion time of the application, we formalize the time model and also the problem. After that, through deep analysis to the characteristic of the two sub-problems and the advantages and disadvantages of genetic algorithm and tabu search, a hybrid genetic algorithm which is improved by tabu search in its crossover and mutation operation is employed to solve the first sub-problem. Then we consider a computing resource rather than a task as a unit and use tabu search to solve the second sub-problem. Next, since the computation complexity of the proposed algorithm is relatively large, an improvement is made in the model of the scheduler and some strategies in reducing the scale of the second sub-problem are employed. Also we parallelize the proposed algorithm. Finally, the algorithm is evaluated with GridSim and simulation results show that while comparing with other algorithms, the algorithm presented in this paper has preponderance to some extent in reducing the completion time of the data-intensive application.
Keywords/Search Tags:Data Grid, Data-Intensive Application, Scheduling, Genetic Algorithm, Tabu Search
PDF Full Text Request
Related items