Font Size: a A A

Research On Adaptive Scheduling Of Multiple Types Of Jobs In Hadoop

Posted on:2017-12-17Degree:MasterType:Thesis
Country:ChinaCandidate:B PengFull Text:PDF
GTID:2428330566453030Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,people are increasingly active in the network,the data of the entire Internet is explosive growth,to seek effective data processing technique and method has become the urgent needs of the real world.As a response to this urgent need,Apache developed an open source distributed computing platform Hadoop came into being.Hadoop can efficiently deal with a large number of data,and gradually become the current representative of the big data platform.In recent years,more and more people use Hadoop to deal with all kinds of jobs,causing job types of Hadoop are great different.The diversification of the job types will bring enormous challenges to Hadoop job scheduling.Currently,the Hadoop job scheduling mainly has the following several problems:(1)The existing scheduling algorithms do not take user expectations into consideration,can not be very good to meet the needs of users(2)The existing scheduling algorithm did not consider the load of each cluster node in the process of scheduling;it is difficult to keep the load balance.(3)Most of the existing scheduling algorithms are aimed at a single operation type,but the actual job types are various,it did not schedule the jobs in different ways.Scheduling algorithm is the key factor that affects the performance of Hadoop platform.Therefore,this article will improve the hadoop job scheduling algorithm to solve this problems,By analyzing the scheduling process of Hadoop platform,the Hadoop job scheduling can be divided into two stages,one is job level scheduling,another is task level scheduling.The job level scheduling is to select job from the queue to run;The task level scheduling is to divide job into multiple tasks and distributed this tasks to each cluster node running.This article proposes two adaptive scheduling algorithms for different requirements from two aspects: task level and task level.The main researches of this article are as follows:(1)This article is aimed at the problem of the existing capacity scheduling algorithm without considering the user's requirements.However the capacity scheduling algorithm has strong stability and maturity,and the Multi-queues scheduling problem is solved.Therefore,on the basis of capacity scheduling algorithm,Considering the factors such as user expectation and job complexity,proposed a self-adaption scheduling algorithm based on user expectations and job complexity.In the job level to optimize,the user expectation and job complexity are divided into multiple queues by Using K-Means clustering algorithm first,and then to scheduling this job queue according to the priority,to improve job processing efficiency and quality of service of Hadoop.considering the user expectations of jobs and job complexity,with the idea of "divide and conquer,the different types of jobs were divided into different queues,and then to scheduling this job queue according to the priority,to improve job processing efficiency and user satisfaction.(2)This article is aimed the problem that the LATE scheduling algorithm can not keep the cluster load balance well,proposed a self-adaption scheduling algorithm based on workload of tasks and workload ability of nodes.In task level to optimize Hadoop scheduling on the heterogeneous environment,consider the workload types of tasks and the cluster nodes in the performance differences,scheduling the different workload tasks to the appropriate cluster node,give full play to the performance of Hadoop cluster,shorten the time to complete the task,keep the workload of cluster node balance,improve execution efficiency of Hadoop.(3)By building Hadoop platform,the performance of the scheduling algorithm proposed in this article are tested,and compared with the original Hadoop scheduling algorithm.Experimental results show that the adaptive scheduling algorithm based on user expectations and the complex jobs is compared with the capacity scheduling algorithm,in the job completion time reduced 10%,in terms of user satisfaction enhance the 35%;the adaptive scheduling algorithm based on the task load and node load ability is compared with the late scheduling algorithm,the completion time reduced about 18%,in the cluster load and make cluster more balanced.
Keywords/Search Tags:Hadoop scheduling, job types, user expectation, load balancing, adaptive scheduling
PDF Full Text Request
Related items