Font Size: a A A

Research On Dependable Job Scheduling In Grid

Posted on:2010-06-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y C TaoFull Text:PDF
GTID:1118360302471130Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the joining of more and more resources such as computing, storage, information, and so on, the dynamic, autonomous and heterogeneous features of grid become more pronounced. The job scheduling is the vital nexus between grid system and users. It not only determines the efficiency of grid system, but also impacts the participation and acceptance of users, and the popularity of grids. In order to guarantee the successful execution of jobs in a dynamic grid environment and the QoS of grid, dependable job scheduling is one of the key and burning issues. In a grid system, various components are interconnected and function coordinatedly. Therefore, in order to realize the dependable job scheduling in a dynamic grid environment, a number of key issues need to be resolved, including the efficient information service, reliable choice of services and effective fault-tolerant mechanisms.In this paper, in-depth exploration and research are focused on the resource discovery, service selection and fault-tolerant mechanism to realize the reliable job scheduling in the dynamic grid environment.Firstly, reliable resource discovery is the foundation of job scheduling, which is accomplished by grid information service. To guarantee the reliability of resource discovery, the paper presents a reliable DHT- and Ontology-based Information Service for grids (DIS). DIS organizes resources into a DHT ring based on VO mode, which effectively solve the issues of frequent joining and leaving of resource nodes and overcome the single node failure and performance bottleneck, and correspondingly improving the reliability of resource discovery. Furthermore, ontology-based information integration is adopted and a novel ontology for grid resources is designed, which supports semantic-based information query. With the mechanical reasoning ability, ontology-based semantic resource discovery can achieve higher precision and integrality than the syntactic keyword and taxonomy-based matching, and guarantee the reliability of resource discovery.Secondly, the paper proposes a resource failures simulator, which provides a basis for decision making in grid research and performance evaluation. The Markov chain-based grid resource availability prediction model is presented and can effectively predict the probability of resource node availability in the future time period to provide reliable resource information for the succedent job scheduling. Based on the resource availability prediction model, a Dependable Grid Workflow Scheduling mechanism (DGWS) is presented. Based on the list and group scheduling, DGWS, first, ranks and groups the subtasks of DAG workflow, and then perform the reliability cost-based scheduling for subtasks in each group in terms of priority. While meeting the QoS requests, DGWS schedules jobs to the reliable resource nodes, not only improving the reliability of job running, but also balancing the load of resources to avoid the scheduling of all jobs to the small number of reliable resource nodes.Thirdly, this paper proposes a Reliable Divisible Job Scheduling mechanism (RDJS). Based on the UMR-based algorithm, RDJS can adjust the schedule timely while resource pool changes, including the leaving and joining of workers. Moreover, if any significant performance variance occurs, RDJS will evaluate its impact and adjust the schedule if necessary. RDJS can efficiently guarantee the reliable executing of divisible job, utilize the grid resources and make the job completion time minimum.Finally, from the fault-tolerant point of view, this paper explores how to guarantee the dependable execution of jobs in the dynamic grid environment. An Optimistic Checkpoint Mechanism for dynamic Grids (OCM4G) is proposed in the paper. OCM4G dynamically determines whether to checkpoint a given job running on a given resource node by considering both the job characteristics and resource availability. OCM4G only checkpoints those jobs likely to fail instead of all jobs to avoid excessive checkpoint overheads. In addition, for the checkpointing jobs, OCM4G establishes optimal aperiodic checkpoint intervals based on the the job characteristics and resource availability.
Keywords/Search Tags:grid computing, information service, dependable job scheduling, ontology, resource availability, resource failure, checkpoint
PDF Full Text Request
Related items