Analysis And Improvement Of Job Scheduling Algorithms In Hadoop

Posted on:2016-07-31

Degree:Master

Type:Thesis

Country:China

Candidate:C C Li

Full Text:PDF

GTID:2348330476955758

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With rapid development of the Internet, the number of people using the Internet grows rapidly and digital information has an explosive growth and it becomes a hot spot for big data analyzing and processing. After Google introduces its big data computation framework MapReduce and distributed file system GFS, open-source software Hadoop has developed rapidly and becomes the most popular platform for big data processing, which was designed on their ideas. Hadoop provides an easy interface for developers who can only focus on the map and reduce functions and reasonably arranges the execution of jobs and tasks through job scheduling without user intervention. Job scheduler is one of core modules in Hadoop, and its goal is maximizing the use of the cluster resources through reasonable order of execution of many jobs and reasonable selection of tasks. Hadoop currently offers three job scheduling algorithms, which are FIFO scheduler, Capacity scheduler and Fair scheduler. FIFO scheduler is simple and easy to implement, but it does not support sharing resources for multi-users and multi-jobs. Capacity and Fair scheduler support sharing cluster’s resources, increase throughput, decrease response time, but they need complicated configuration and administrator’s fully understanding of the cluster’s resources and types of users and jobs.Based on domestic and foreign research on Hadoop, The paper analyzes the core idea and scheduling policy of existing scheduling algorithms and improves the slot allocation algorithm in Fair Scheduler. Then it analyzes the advantages and disadvantages and put up an scheduling algorithm based on Bayesian classification to overcome complicated configurations of the existing scheduling algorithms. The algorithm ensures jobs’ running on nodes without overloading based through Bayesian learning and classifying. The paper then pre-processes jobs to CPU intensive and I/O intensive according to requirements of jobs to use computing resources more effectively. The paper’s contents are as follows.Firstly, deeply analyze and compare the FIFO Scheduler, Capacity Scheduler and Fair Scheduler in Hadoop, including their core idea, configuration, displaying of the pseudo-code, flowchart form with complexity description, features, advantages and disadvantages. Then it improves the slot allocation algorithm in Fair Scheduler to allocate remaining slots as fair as possible.Secondly, the paper puts up an scheduling algorithm to decrease and overcome the complicated configurations in existing scheduling algorithms. The algorithm classifies jobs for schedulable and not schedulable using Bayesian classifier which uses the job scheduling and executing history for learning according to features of jobs and nodes. Thus it schedules jobs to execute without nodes overloading as far as possible to improve the scheduling accuracy and resource usage of nodes.Thirdly, the paper puts up an pre-processing step to classify jobs for CPU intensive and I/O intensive and schedules them separately to improve resource usage.Fourthly, the paper chooses different types of typical jobs for experiment, and give assessment methods for the algorithm them. Then give results for scheduling accuracy, response time and cluster’s resources usage ratio and analyzes the results comparing with the existing scheduling algorithms.

Keywords/Search Tags:

Hadoop, MapReduce, Job Scheduling, Bayesian Classification

PDF Full Text Request

Related items

1	Research On Classification Algorithm Used HADOOP
2	Research On Scheduling Algroithm In Hadoop Mapreduce
3	The Mapreduce Model In The Hadoop Implementation Of Performance Analysis And Optimization Improvements
4	Design Of Mapreduce Task Scheduling Algorithms In Heterogeneous Hadoop Cluster
5	The Research Of MapReduce Job Scheduling Algorithm Based On The Hadoop Platform
6	Research And Implement Of Job Scheduling Method For Multi_User MapReduce Clusters
7	Research And Implement Of Job Scheduling Method For Multi_user Mapreduce Clusters
8	Research And Improvement Of Job Scheduling Algorithms On Hadoop Platform
9	An Optimized MapReduce Workfow Scheduling Algorithm For Heterogeneous Computing
10	Research On Hadoop Cluster Scheduling Optimization