Font Size: a A A

The Research Of Job Scheduling Algorithm In Mapreduce-styled Massive Data Processing Platform

Posted on:2016-12-16Degree:MasterType:Thesis
Country:ChinaCandidate:S K LiuFull Text:PDF
GTID:2308330503450650Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the wide popularization of network in our daily life, the surge of mass data has been brought. Big data research has become one of the hot research field in the computer field. Map Reduce-Styled data processing platform is the latest research results of mass data processing technology. It has new feathers in simplified parallel programming model, computing mode based on data locality principle and resource refinement management.Job scheduling is the core functions of the MapReduce model, and it is mainly responsible for cluster management of computing resources and Job scheduling, to ensure that the user’s job fair use under the MapReduce platform computing resources.This paper makes a thorough analysis of the job scheduling model based on MapReduce platform. The existing problems such as short operation execution efficiency, waiting for the job and waiting for idle resource are not high in the existing delay scheduling algorithms. In this paper, we study the algorithm for improving the delay scheduling. The efficient execution of the job is the primary goal and the fairness of the job. The mapping model, the reasonable allocation of idle resource, and the priority scheduling of the hungry job are studied. The main contributions of this paper are as follows.(1)The model of the corresponding relationship between tasks and computing resources for the MapReduce platform is designed. The data organization model of the job is modified to be scheduled when the relationship between the job and the task under the MapReduce platform is saved. The single job queue structure is changed to a number of computing node task queues. Each computational node queue memory can localize tasks performed on the computing resource. It reduces the hit time for the task of computing nodes looking for data localization tasks.(2)A job scheduling strategy based on the real-time load of the cluster is designed. In this paper, we improve the job scheduling strategy to set the waiting time for the unit of resource. It replaces the existing delay scheduling algorithm in order to set the waiting time for the job as the unit and reduces the number of times generated by each job scheduling. Combined with clusters in the current network environment and analysis of idle computing node localization task to reach a record, the algorithm Assign data localization tasks for free computation nodes or assign data reasonably to non data localization tasks. The algorithm monitors the waiting time in the cluster, priority scheduling starvation, and improves the execution efficiency of the system.(3)This paper presents a formula for calculating the task priority of multi-weights. The priority weight is calculated by each attribute of the task, and the tasks are sorted according to the results of the weights. The task of the algorithm is first to schedule the short tasks and perform the tasks of large proportion, it accelerates the throughput of the system.(4)In this paper, we test and analyze the improved algorithm based on Hadoop platform. Under the same experimental conditions, according to the three indexes that has data locality, the degree of average waiting time and response time compares the improved algorithm and FIFO algorithm, the traditional delay scheduling algorithm. The results show that the improved algorithm designed in this paper is more excellent than the other two algorithms performance. The Task-Job algorithm is compared with the delayed scheduling algorithm, the average waiting time of jobs fell by 80.64% with Task-Job algorithm.
Keywords/Search Tags:MapReduce, Hadoop, Job scheduling
PDF Full Text Request
Related items