Font Size: a A A

Research And Improvement Of Job Scheduling Algorithm Based On Hadoop

Posted on:2016-11-16Degree:MasterType:Thesis
Country:ChinaCandidate:D L WangFull Text:PDF
GTID:2308330470469714Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Hadoop is borrowed from Google’s GFS and MapReduce technology, in the context of cloud computing and big data, open source distributed computing platform for the birth, so that developers can easily develop and run massive data processing applications. Because Hadoop is open source, and it has distributed, high efficiency, low cost, reliability and so on, so in a few years Hadoop is widely used by industry and academia, such as Facebook and Google, it has become the distributed mainstream mass data processing platform.Job scheduling is one of the important technical Hadoop platform, which directly affects the overall performance of Hadoop, and resource utilization. Its main function is to execute the order of operations and the allocation of computing resources to control, the goal is to ensure the efficient operation of the system can either take full advantage of computing resources, but also to ensure customer satisfaction. In many cases the user will use embedded Hadoop scheduling algorithm, and now embedded Hadoop scheduling algorithms exist shortcomings, researches on embedded Hadoop scheduling algorithm, not only to meet the needs of users, but also for the practical application of produce significant.Currently three embedded Hadoop scheduling algorithms are FIFO method, Fair Scheduling algorithm, and Capacity Scheduling algorithm. FIFO algorithm is simple and practical, but it only applies to a single system, a single user’s Hadoop cluster. Fair Scheduling algorithm is contributed by Facebook, which supports multi-user and multi-queue, ensure fair sharing of resources, and so easily cause irrational use of resources, and load balancing can not guarantee between each node within the process of scheduling job. Capacity Scheduling algorithm is developed by Yahoo. It also supports multi-user and multi-queue, but Capacity Scheduling algorithm has inherent defects in birth—does not support preemptive job scheduling, therefore as for Capacity Scheduling algorithm, low-priority job cannot be temporarily swapped out to make the job with higher priority run. When faced with a large number of jobs submitted by the user, Capacity Scheduling algorithm cannot meet the real-time needs of users, high-priority jobs cannot be met in time, seriously affecting the user’s work efficiently. So attempting to improve the Fair Scheduling algorithm and Capacity Scheduling algorithm is necessary.According to shortcoming of Capacity Scheduling algorithm and Fair Scheduling algorithm this paper respectively puts forward the Preemptive Capacity Scheduling Policy (PSCP) and Load Balancing based on Fair Scheduler. PCSP and LBFS will be introduced in terms of mathematical model, algorithm flowchart and algorithm pseudo-code.This paper described how to implement PCSP and LBFS on Hadoop platform, and to deploy existing FIFO, Fair Scheduling algorithm, Capacity Scheduling algorithm. The PCSP compared with previous three algorithms by deploying them on Hadoop platform, and LBFS compared with Fair Fair Scheduling algorithm with the metric of load balance. Finally, experimental results proved that improved Capacity Scheduling algorithm PCSP successfully achieved the desired goal:support preemptive job scheduling, but with respect to Hadoop’s own scheduling algorithm, PCSP also had an outstanding performance in the performance. At the same time, LBFS achieve load balancing between nodes when scheduling jobs.
Keywords/Search Tags:Hadoop, PCSP, LBFS, Capacity Scheduler, Fair Scheduler, MapReduce
PDF Full Text Request
Related items