Font Size: a A A

Design And Implementation Of YARN Resource Scheduling Strategy Optimization Method

Posted on:2020-05-04Degree:MasterType:Thesis
Country:ChinaCandidate:F Z HeFull Text:PDF
GTID:2428330590473233Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of science and technology,human beings have entered the era of information explosion,and the exponentially increasing amount of data has driven the maturity of big data analysis,distributed computing frameworks,and cloud computing technologies.To cope with a variety of data processing needs,Hadoop YARN constructs a resource management module as an independent general system YARN.This structure not only solves the problems of poor scalability,poor reliability and low resource utilization in Hadoop 1.0,but also enables Hadoop YARN to support a variety of computing frameworks.Through the analysis of heterogeneous loads on Hadoop YARN cluster with multiple purposes,it is found that tasks in cluster share disproportionately the resources of cluster,and the number of short-term tasks is much more than that of long-term tasks,but the small proportion of long-term tasks consumes most of the resources of cluster,which is not conducive to the execution of short-term tasks.At the same time,clusters running heterogeneous workloads often have a large number of "allocated but unused" resource fragments,which reduces the utilization of cluster resources.The existing research work YARN-mix extends the original centralized scheduler of YARN into a hybrid scheduler by introducing distributed schedulers,and then separates long-term and short-term tasks to make distributed scheduler use of resource fragments.This improvement initially solves the above problems,but the way YARN-mix uses static time thresholds to classify tasks into long-term tasks and short-term tasks also hinders the optimization of cluster resource utilization.In order to solve the above problems of YARN-mix,this paper further studies the method of optimizing cluster resource utilization based on the resource management framework with hybrid scheduler.(1)Dynamic adjustable time thresholds are used as criteria to classify tasks in clusters into long-term tasks and short-term tasks,rather than static time thresholds.(2)Using load Prediction techniques combined with periodic historical load records and linear regression models to predict the load of Hadoop clusters in future time periods.(3)Combining factors such as running state of cluster and load prediction value of cluster,Using reinforcement learning method with on-line learning to learn the strategy of generating optimal time threshold.(4)Because tasks can better annotate their own resource requirements in the computing framework,we choose MapReduce,the native computing framework of Hadoop,to classify tasks in the cluster into long-term tasks and short-term tasks in MapReduce by using the task execution time estimation model and time threshold generated by time threshold decision strategy,and long-trm tasks are scheduled by a centralized scheduler with global resource view,and shortterm tasks are scheduled by distributed schedulers with scalability and efficiency.Based on the above four points,we propose our own cluster resource management framework YARN-P(a resource management framework based on load prediction),which further optimizes the resource utilization of the cluster and shortens the execution time of the load.Finally,the availability and accuracy of load prediction techniques and task execution time estimation models is verified by module experiments.Through the comparison experiment,and using the custom load reproduced by the SWIM tool,the overall performance comparison test of YARN-mix and YARN-P was carried out.The experimental results show that compared with YARN and YARN-mix,YARN-P improves the average CPU utilization of the cluster by 21.48% and 3.68%,increases the average memory utilization of the cluster by 34.54% and 9.30%,and reduces the execution time of multiple loads by at least 31.02% and 9.53%,respectively.
Keywords/Search Tags:Hadoop YARN, Hybrid Scheduler, Load Prediction, Reinforcement Learning, Task Classification
PDF Full Text Request
Related items