Font Size: a A A

Research And Improvement Of Hadoop YARN Resource Allocation Mechanism

Posted on:2018-02-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y C ChenFull Text:PDF
GTID:2428330569485428Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Hadoop YARN is the 2nd version of the distributed storage and parallel computing framework Hadoop,and YARN is a cluster resource management system separated from the original MapReduce.The resource scheduler is the core component of the YARN system,which is used to allocate and schedule computing resource for applications on the cluster.YARN has three built-in schedulers,but as more and more types of applications are supported by YARN,these schedulers are not able to meet the needs of applications.Researching and improving the resource allocation mechanism of scheduler has great practical significance in improving the utilization ratio of cluster resources and reducing application computing time,so as to improve system performance and responsiveness.Based on the research of Hadoop YARN built-in scheduler,the resource allocation mechanism of queue selection,job selection,container problem existed in the optimization and improvement scheme are put forward.Based on the existing queue selection mechanism,the selection strategy is not changeable,the configuration is complex,and the resource consumption of queue selection policy is large.A hybrid queue selection strategy based on queue load is proposed.In order to solve the problem of long job response time in job selection mechanism,a method to predict the residual size of job is designed,and the job selection strategy based on the residue size is improved.The existing container selection strategy to data locality optimization calculation task,the delay time delay algorithm,but the threshold is fixed,and the need of human configuration,difficult to use,to optimize the performance of data locality is poor,thus affecting the calculation time of operation.To solve this problem,a dynamic setting method of delay threshold based on task data and node load feedback is designed to optimize the data locality of the container allocation,thus shortening the computing time of the job.The calculation cluster experiment with 4 high performance x86-64 servers as a test platform,using a variety of test cases,three optimization schemes were validated and analyzed,and the scheduler optimized overall test analysis.Experimental results show that the three optimization schemes can effectively improve the resource allocation efficiency of existing schedulers,reduce the average response time of jobs in clusters,and shorten the computing time of jobs.
Keywords/Search Tags:Hadoop YARN, resource scheduling, queue selection, job scheduling, delay scheduling
PDF Full Text Request
Related items