Font Size: a A A

Task Scheduling Optimization Based On Time And Load Balance Under The Hadoop Platform

Posted on:2019-01-08Degree:MasterType:Thesis
Country:ChinaCandidate:Z G PanFull Text:PDF
GTID:2438330596455304Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the advance of "Internet plus",many large,medium-sized and small enterprises actively restructure their enterprises in order to respond to the call,and these enterprises produce huge amounts of data every day.The storage and computing process of these massive data needs a new method of computing,and cloud computing comes into being in this case.The Hadoop platform is the most widely used cloud computing platform,and it is also the most widely used platform for experts and scholars to study big data.MapReduce is a distributed computing framework,the performance of MapReduce directly affects the performance of Hadoop cluster.Therefore,the research of MapReduce job scheduling algorithm is of great significance for improving the performance of Hadoop cluster.In the existing scheduling algorithms,the scheduling of the Reduce stage is slightly simple,and there are two defects in the scheduling.They are low execution efficiency small jobs and data skew.Based on the research of MapReduce execution process and its scheduling algorithm,this paper proposes a Reduce task scheduling algorithm(task time and load balance)based on time and load balance.The algorithm starts to sampling key when Map phase of a job begins,and estimate the remaining completion time of the Map stage,according to the time Map stage used and other related information.By comparing the remaining completion time of Map phase,the Reduce task in the waiting queue is reordered,so the operation efficiency of small job's Reduce task can be improved.On the base of sampling,we get the frequency of key,and then the data distribution can be estimated.We divide these data into Reduce nodes by use of greedy algorithm,so the problem of data skew can be solved and load balancing can be achieved.Finally,by comparing with fair scheduling algorithm that comes from Hadoop and quantile partitioning algorithm,the experiment result shows that the proposed algorithm can not only improve the execution efficiency of small job's Reduce task,but also can achieve better load balance and reduce the execution time of the job.
Keywords/Search Tags:MapReduce, data skew, small job, load balance, greedy algorithm
PDF Full Text Request
Related items