| With the continuous development of Internet technology in twenty-first Century,in the fields of astronomy,biology,high energy physics and other scientific research,computer simulation,Internet application,electronic commerce and other fields,the amount of data presents a trend of geometric growth.The traditional data processing technology can not meet the complex calculation of TB or PB level data.MapReduce is the most effective large data processing programming model for its simplicity,scalability and high error tolerance,and it can effectively monitor the parallel data processing program running in the cluster.However,the existing MapReduce implementation platform lacks the efficient automatic scheduling algorithm,which consists of a series of large data engineering scheduling algorithms,which are composed of a series of first followed MapReduce jobs.To solve this problem,researchers are trying to integrate these big data projects into workflow.In this article,a MapReduce workflow is described as composed of a number of jobs with successive qualifying relationships and each job can be decomposed into a Map and Reduce phase that contains multiple tasks.On the basis of the resource of the computing cluster and task heterogeneity,we construct the two-level Directed Acyclic Graph(DAG)model based on Job and Task,and propose a heterogeneous scheduling algorithm based on the two level priority ranking(2-MRHS).The algorithm is divided into two stages: priority ranking and task assignment.First,the scheduling queue of Task is obtained by calculating the priority weights of the Job level and Task level respectively;then,on the basis of the tasks' Earliest Finish Time(EFT),the data block subtasks included in each task are assigned to the appropriate computing node.Experimental results show that the algorithm has shorter scheduling length(makespan),higher optimization ratio and better stability than other related algorithms.With the lack of unified large-scale experimental platform for task scheduling at present,this paper builds a DAG scheduling simulation system based on Layer-by-Layer task graph generation algorithm and defines the parameters of DAG model in the research of workflow task scheduling.And we improved it with the two level DAG model of the MapReduce workflow.It lays an experimental foundation for the research of related scheduling algorithms. |