Research Of Self-Learning Resources Scheduler Model Based On The Hadoop System

Posted on:2017-09-30

Degree:Master

Type:Thesis

Country:China

Candidate:D T Zeng

Full Text:PDF

GTID:2348330503990021

Subject:Systems analysis and integration

Abstract/Summary:

PDF Full Text Request

With the coming of the information explosion era, the cloud computing and big data technology comes into being. The Hadoop software is a framework that allows for the distributed processing of large data sets across clusters of computers using simple MapReduce programming models. As the cluster grows in size, how to improve the resource utilization of the cluster, and shorten the response time of tasks, optimizing the resource scheduler of Hadoop, to improve the efficiency of the cluster, it has become a hot research field of cloud computing.Integrating the research progress and in the comparison of several common resource schedulers in Hadoop system, a Self-learning Resource Scheduler model based on job classification is improved in this dissertation, so as to improve the resource utilization of heterogeneous Hadoop cluster, and shorten the execution time of jobs at the same time. The dissertation mainly completed the following research content:First, the history of the Hadoop system and the research progress of Hadoop resource scheduler both at home and abroad are introduced. Second, the two cores of Hadoop system are explained,they are hadoop distributed file system and parallel programming model called MapReduce. Third, a detailed analysis of three resource schedulers of the current Hadoop system arose, which are FIFO, Capacity Scheduler and Fair Scheduler. The dissertation explains their implementation principle and analyses the respective advantages and disadvantages and application scenarios. Fourth, modeling to Hadoop system, the resources of each node can be described abstractly as virtual core and memory. The virtual core has the property of an execution speed, memory has two attributes, which are size and data arrival rate. There are three performance evaluation indexes in the Hadoop system, which are local characteristics, the average completion time of job and fairness. Fifth, the dissertation improves the model of a Self-learning Resource Scheduler, which will be proved by experiments. To begin with, the dissertation build a job classifier, there is a queue to match each type of job. When a job arrives, the scheduler add it to the corresponding queue. And the Self-learning Resource Scheduler maintain a quota table which records the resources demand of all kinds of jobs, and it uses a specific resource real-time dynamic allocation policy to regularly update quota list according to the historical statistical data. As a result, a positive feedback adjustment has formed. In the experimental stage, the dissertation select three kinds of jobs, including word count, sort and matrix multiplication to do comparative experiments. The experiments compared job completion time, the CPU utilization and memory utilization of the whole cluster in three cases, in which the Hadoop system used FIFO Scheduler, Capacity Scheduler and Self-learning Resource Scheduler separately. Thus draw the conclusion: when the Self-learning Resource Scheduler deals with those jobs whose reduce tasks are less compute intensive and cost less time, or those jobs which have fewer disk IO and are compute intensive, it can significantly shorten the job completion time and improve the resource utilization of the cluster.

Keywords/Search Tags:

Hadoop, Resource scheduling, Self-learning, MapReduce jobs

PDF Full Text Request

Related items

1	Research On Data-Aware Scheduling Strategies Of MapReduce Jobs
2	Research And Implementation Of Hadoop Platform Performance Optimization
3	Research And Implementation Of The MapReduce Jobs Composition System
4	The Research Of MapReduce Job Scheduling Algorithm Based On The Hadoop Platform
5	Research On Optimal Scheduling Of Jobs Approach In Cloud
6	Research On Optimization And Improvement Of MapReduce Job Scheduling Algorithm
7	Research On Fine-grained Resource Allocation Method For MapReduce Jobs In Cloud Computing Environment
8	Computing Resource Utilization Analysis And Multi Job Scheduling Algorithm Design Of MapReduce/Hadoop
9	Research On Dynamic Resource Allocation Method For Multi MapReduce Jobs In Cloud Computing Environment
10	Research On Scheduling Algroithm In Hadoop Mapreduce