Font Size: a A A

Data-aware Scheduling And Data Management Based On LSF And Gfarm

Posted on:2007-08-24Degree:MasterType:Thesis
Country:ChinaCandidate:J H JiangFull Text:PDF
GTID:2178360182496247Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
LSF, (Load Shared Facility), is developed by Platformcorp. It aims at solving the scheduling and management ofcomputing-intensive jobs under cluster environments. Gfarm,(Grid Data farm), is developed by AIST, Japan. It has theadvantage of solving scientific data;however, it has thedeficiency of job scheduling and data management. In orderto solving this deficiency of Gfarm filesystem, this paperdevelops a LSF scheduler based on Data-aware algorithm anddata management mechanism. This paper presents how to design the scheduler plug-in.In chapter 2, this thesis explains the Gfarm architecture,Gfarm parallel I/O API and Gfarm security mechanism;At thesame time, this thesis introduces the plug-in mechanism ofLSF scheduler,the API Framework of LSF and how to programwith the API Framework. Based on these two products and theirapplication environments, this paper considers them as theprogramming environments of its Data-aware scheduler.In the 4th paragraph of this paper, it presents some keyaspects of scheduler plug-in based on Data-aware algorithmand global data management mechanism. These aspects includesome factors about the scheduler based on Data-awarealgorithm and the other factors about the involved datamanagement mechanism. These key aspects about the schedulerbased on Data-aware algorithm are: 1) The location of fileor its replica, 2) The selection of pre-scheduling node, 3)Job real scheduling, 4) Result notification. These fourfactors can also be considered as the four Data-awarescheduling phases. The first factor is the necessarycondition of the Data-aware scheduling, since the schedulercannot continue the next phases without finding the filebased on its name or keywords;The second factor is the keyfactor, since we can make the Gfarm filesystem's loadbalanced only if the suitable host selected;The third factor,job real scheduling , will finish the real scheduling, andwill dispatch the concrete job on the real host;The lastfactor, result notification, is used as an notification tonotify the execution results of jobs, and the user can decidewhether to submit these jobs again or not. The algorithmbased on Data-aware is as below:I) The algorithm based on Data-aware:1) Read a Gfarm job from the LSF's job waiting queue,and make sure whether the job is labeled as"Scheduled" or not.2) If it is labeled as "scheduled", the schedulerdelegates the right to the job's Workflow, and theWorkflow to decide how to do next, goto 5), or, thejob is inserted an Workflow based on its specificationor an light-weighted Workflow. If cannot be insertedin any Workflow, goto 6).3) After being inserted into the Workflow, the job islabeled as "Scheduled".4) Labeling the job as "Launching", and continuing tobe executed by the Workflow.5) Gaining the related statistical data and its requiredinformation.6) Based on the statistical result, the schedulerdecides whether to create a replica or not.7) Selecting a suitable host to create a replica.8) Creating a related Workflow based on the newly createdreplica, and resetting the waiting queue's statuerelated to this Workflow.9) Labeling this job's statue as "Scheduled".In order to enhance the performance of Data Grid Gfarmfilesystem, the thesis brings global data managementmechanism into consideration. Global data managementmechanism must consider these four aspects as followed: 1)The time of creating a new replica, 2) The selection methodfor a new node to hold the replica, 3) The second pre-electionof all related jobs, 4) The duration of the replica existed.Among these factors, the first factor, the time of creatinga new replica, is the key factor to decide these jobs'processing efficiency, which requires the same data (filereplica or file fragment) to be processed;The second factor,the selection method for a new node to hold the replica, isthe main factor to decide the duration of creating a newreplica;The third factor, the second pre-selection of allrelated jobs, is to make sure that it does not change thejob executing sequence based on job priorities;the lastfactor, the duration of the replica existed, is to enhancethe ability to manage the data ( file replica or file fragment)and avoid taking the illegal disk space, even though iteffects the efficiency of the Gfarm filesystem processingcapability. The global data management algorithm as below:II) The global data management algorithm:1) According to Definition 6, the time of creating afile's replica is the time to create a replica of thisfile, when the number of jobs related to this file is morethan or equal to the number which is set by theadministrator.2) According to the formula of Z = k*Distance + m*Load_source + n*Load_target, we can calculate which host isthe most suitable. We also know. There is thecorrespondence of the selection between the source hostand the target host. Therefore, we must make sure whichhost is the source host first, and we start to calculatethe formula mentioned above. (We can get the values ofLoad _source and Load_target through the LIM from LSF.We can decide which host to be the source host by its load,and stop the execution of jobs at this host to start tocopy replica when the source host is decided.)3) Creating a new replica. After a new replica iscreated, a new Workflow is also created at the same time.The scheduler modifies the item number's value of t inthe global data management table, and creates a newWorkflow data structure.4) The second pre-selection of jobs. The strategy of thesecond pre-selection of jobs is the same as that depictedin the thesis's section 4.2.3's alogritm of jobsreselection. However, we should pay the attention to thefact that the statue of the job is not stable once itsstatue is labeled as "Scheduled" after the job isre-selected. Because the Gfarm filesystem maybe createa new replica and its corresponding Workflow, it needsto start a new circle of the second pre-selection.5) After inserting into the Workflow, the schedulerdelegates the job to this Workflow to execute.6) The job is managed by the Workflow. The schedulerlabels the job as "Launching", and ends the phase of jobdispatching. The job enters the execution phase.7) The job is finished. (That the job is finished doesnot mean the end of data management but mean this job istotally ended.)Notice: If this job is the last in the Workflow and theWorkflow's replica is not original, the scheduler willdelete the corresponding replica. After deleting the replica,the scheduler ends the job. Otherwise, the job is finisheddirectly. During the operation of deleting the replica, thescheduler will modify the item number's value of t in theglobal data management table and the pointer between fatherand son's data structures at the same time.At last, this paper gives three kinds of testing methodbased on the scheduling plug-in based on the Data-awarealgorithm and the global data management mechanism. They are:1) Scheduling without using the Data-aware algorithm, 2)Scheduling with the Data-aware algorithm, but not creatinga new workflow, 3) Scheduling with the Data-aware algorithmand creating a new workflow. Based on these three kinds oftesting data, this paper draws this conclusion: When it needssolving lots of data-intensive jobs, which requires the samedata (file replica or file fragment), adopting this paper'sscheduling plug-in can greatly improve the efficiency.This thesis explains some deficiencies in schedulingwith data-intensive jobs and data management on data gridGfarm. It designs and develops a plug-in based on Data-awaremethod using LSF plug-in mechanism. At the same time, theglobal data management is introduced and makes up for thedeficiencies of the data-intensive job management andprocessing in data grid Gfarm filesystem. Therefore, thisthesis can be referenced by some researchers in theseaspects.
Keywords/Search Tags:Data-aware
PDF Full Text Request
Related items