Font Size: a A A

Research On SLA-Based Scheduling Meghanism In Kapreduce Environments

Posted on:2015-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:J WangFull Text:PDF
GTID:2268330431453431Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
MapReduce, as an effective solution for data analysis and processing, has been widely applied in the field of large-scale data processing. With the popularity of MapReduce, an increasing number of service providers provide MapReduce business service. The service providers execute MapReduce jobs to realize a series of business logic and return the results of data analysis and processing to users. In order to guarantee benefit of the parties, users sign Service Level Agreements (SLAs) with service providers. Service providers must follow SLAs and satisfy users’requirements on job response time; otherwise, they have to pay penalties. Therefore, how to do effectively job and task scheduling in MapReduce environments in order to satisfy users’SLAs is attracting more attention from service providers.The differentiation and personalization of SLAs and cluster sharing bring a great deal of challenges for solving the problem.1) Different requirements of users increase the diversity of MapReduce job types. There may be ad hoc queries jobs, production batch jobs and machine learning jobs running in the cluster at the same time. Even for a set of jobs which analyze on the same data sets, short interactive jobs and long batch jobs may be mixed together. Accordingly, users have very different requirements on job response time which defined in SLAs.2) Service providers share MapReduce clusters among multiple user groups for the purpose of reducing network and storage costs brought by building independent cluster and data transfer among multiple clusters because of sharing same data sets. However, this can cause that the performance of a job are affected by the execution of other concurrent jobs. Therefore, it brings more challenges for satisfying users’SLAs.Existing scheduling mechanisms in MapReduce environments focus on the fair sharing of cluster resources among multiple users, allocate resources and schedule jobs by priority-based strategy. However, such mechanisms have little awareness’of users’ SLA. Reflecting the differentiation among users’SLAs through the priority of corresponding jobs is inaccurate and it is difficult to mapping the user’s SLA to a certain priority. Additionally, they can’t be aware of the changes of cluster running status and job execution status and users’SLA cannot be satisfied accurately and effectively.Addressing the challenges presented above, we propose a SLA-based scheduling mechanism for MapReduce and solve the problem of guaranteeing users’ SLAs from the following aspects:dynamic modeling of job performance, job scheduling and task scheduling optimization. The main work and achievements of the paper include:1. We propose a SLA-based MapReduce scheduling architecture and add a pluggable scheduling support node, which can flexibly support users’ SLAs from two scheduling levels. Then, we propose a dynamic adaptive MapReduce job performance model under the architecture. Based on the historical running statistics, cluster and job running status, the model predicts and determine whether potential SLA violations can cause.2. Combined with job performance model, a SLA-based two-stage job scheduling mechanism is proposed to satisfy differentiated users’ SLA in MapReduce environments. The mechanism predicts the minimum resource share for each job which needed to keep the response time within the unbound defined in the SLA and corresponding marginal gain. Based on predictive results, we realize resource partitioning and job scheduling to achieving the goals of maximizing the satisfaction of SLAs, avoiding the blindness of the allocation of surplus resources in the clusters and improving the total profits gained by service providers.3. Based on the job scheduling strategy, we proposed a data-aware task assignment optimization mechanism, which reduces the operation of data moving contained in the process of task execution as much as possible in order to improve task execution efficiency, shorten job response time and optimize satisfaction rate of SLA through the feedback control loop of the scheduling architecture. According to the different characteristics of map task and reduce task, the mechanism explores the data distribution of input data sets and immediate key-value pair results, and effectively assigns tasks based on the data locality scheduling weight of map task and data transfer costs of reduce task respectively.4. Experiments are carried on to evaluate the accuracy of the proposed job performance model, the effectiveness of the SLA-based job scheduling strategy and the task assignment optimization strategy.
Keywords/Search Tags:MapReduce, SLA, Scheduling Mechanism, Adaptive Performance Model, Data-Aware
PDF Full Text Request
Related items