Font Size: a A A

Research And Implementation Of Hadoop Job Scheduling Algorithm In Heterogeneous Environment

Posted on:2019-06-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y TianFull Text:PDF
GTID:2348330545461546Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The maturing of cloud computing technology provides enterprises with a viable,inexpensive solution to big data processing.Hadoop is the next open source distributed storage and parallel computing platform of the Apache Foundation.Due to its high reliability,easy scalability,and high fault tolerance,Hadoop is widely used in big data processing.In terms of cloud computing applications,job scheduling and resource allocation have always been issues that need to be focused and resolved.On the cloud computing platform,there are often multiple jobs that need to be scheduled to run at the same time,and each job is subdivided into several subtasks that run independently.Therefore,how to coordinate the resource allocation and scheduling of each task is crucial.The Hadoop resource scheduling management framework YARN provides three built-in resource schedulers.However,with the extension of the application,the existing resource scheduling algorithms can no longer meet the requirements,and in many cases,the performance of the Hadoop system is restricted.Therefore,research on the rational allocation of resources and job scheduling can improve the system's resource utilization,thus reducing the time for job execution,and ultimately making the platform's overall performance fully enhanced.To study Hadoop job scheduling and resource allocation to improve system performance,the research content of this paper is summarized as follows:(1)The resource scheduling problem is an NP-hard problem,ie,an optimal solution cannot be obtained in polynomial time.The swarm intelligence algorithm has a good performance in solving such problems,such as genetic algorithm,ant colony algorithm,cuckoo algorithm and so on.The cuckoo algorithm is a new and efficient group intelligence algorithm,but it also has some disadvantages.Aiming at the shortcomings of the cuckoo algorithm,such as weak convergence and low precision,the paper proposes a hybrid genetic cuckoo optimization algorithm and introduces a genetic algorithm based on the cuckoo algorithm.While retaining the strong global search capability of the cuckoo algorithm,it combines the characteristics of good local convergence of the genetic algorithm and accelerates the convergence speed of the algorithm in the later period.(2)This thesis deeply analyzes the resource management and allocation mechanism of Hadoop YARN,modeling Hadoop resource management and job scheduling process,and applying hybrid genetic cuckoo algorithm to YARN resource scheduling.The algorithm can obtain information such as CPU rate,memory capacity,and load of the node through the management mechanism of YARN.According to the resource application requirements of the task,each task is assigned to a suitable resource node.At the same time,in the process of scheduling,the adjustment of task priorities is added.Tasks with large demand for resources and tasks to be completed are increased,their scheduling priorities are increased,and tasks with large resource requirements are not scheduled for a long time and are about to be completed.The task is stuck in a long time waiting for the dilemma.The research and test results show that the hybrid genetic cuckoo algorithm presented in this paper is superior to the cuckoo algorithm in solving the optimal value of the standard function.At the same time,when this algorithm is applied to Hadoop resource scheduling,it can effectively improve the system resource utilization and shorten the cluster's job execution time.
Keywords/Search Tags:Hadoop, YARN, Job Scheduling, Resource management, Hybrid Genetic Cuckoo Algorithm
PDF Full Text Request
Related items