Font Size: a A A

Research Of Self-Learning Resources Scheduler Model Based On The Hadoop System

Posted on:2017-09-30Degree:MasterType:Thesis
Country:ChinaCandidate:D T ZengFull Text:PDF
GTID:2348330503990021Subject:Systems analysis and integration
Abstract/Summary:PDF Full Text Request
With the coming of the information explosion era, the cloud computing and big data technology comes into being. The Hadoop software is a framework that allows for the distributed processing of large data sets across clusters of computers using simple MapReduce programming models. As the cluster grows in size, how to improve the resource utilization of the cluster, and shorten the response time of tasks, optimizing the resource scheduler of Hadoop, to improve the efficiency of the cluster, it has become a hot research field of cloud computing.Integrating the research progress and in the comparison of several common resource schedulers in Hadoop system, a Self-learning Resource Scheduler model based on job classification is improved in this dissertation, so as to improve the resource utilization of heterogeneous Hadoop cluster, and shorten the execution time of jobs at the same time. The dissertation mainly completed the following research content:First, the history of the Hadoop system and the research progress of Hadoop resource scheduler both at home and abroad are introduced. Second, the two cores of Hadoop system are explained,they are hadoop distributed file system and parallel programming model called MapReduce. Third, a detailed analysis of three resource schedulers of the current Hadoop system arose, which are FIFO, Capacity Scheduler and Fair Scheduler. The dissertation explains their implementation principle and analyses the respective advantages and disadvantages and application scenarios. Fourth, modeling to Hadoop system, the resources of each node can be described abstractly as virtual core and memory. The virtual core has the property of an execution speed, memory has two attributes, which are size and data arrival rate. There are three performance evaluation indexes in the Hadoop system, which are local characteristics, the average completion time of job and fairness. Fifth, the dissertation improves the model of a Self-learning Resource Scheduler, which will be proved by experiments. To begin with, the dissertation build a job classifier, there is a queue to match each type of job. When a job arrives, the scheduler add it to the corresponding queue. And the Self-learning Resource Scheduler maintain a quota table which records the resources demand of all kinds of jobs, and it uses a specific resource real-time dynamic allocation policy to regularly update quota list according to the historical statistical data. As a result, a positive feedback adjustment has formed. In the experimental stage, the dissertation select three kinds of jobs, including word count, sort and matrix multiplication to do comparative experiments. The experiments compared job completion time, the CPU utilization and memory utilization of the whole cluster in three cases, in which the Hadoop system used FIFO Scheduler, Capacity Scheduler and Self-learning Resource Scheduler separately. Thus draw the conclusion: when the Self-learning Resource Scheduler deals with those jobs whose reduce tasks are less compute intensive and cost less time, or those jobs which have fewer disk IO and are compute intensive, it can significantly shorten the job completion time and improve the resource utilization of the cluster.
Keywords/Search Tags:Hadoop, Resource scheduling, Self-learning, MapReduce jobs
PDF Full Text Request
Related items