With the rapid development of Internet technology and the the popularity of ofdata-intensive applications, the large of Internet data showed the trend of explosivegrowth. To solve the problem of how to store and process the large-scale mass data,Cloud computing arises and develops rapidly, hadoop as mainstream cloudcomputing platforms has attracted much attention, scheduling problem as one of thekey factors affecting the Hadoop clusters performance has become a hot researchtopic.In cloud computing platforms, The nodes of the clusters were interconnectedvia networks, limited bandwidth often become the bottleneck. Therefore, how tohow to assign the task of fairness and reduce the data transmission between nodes isa key problem. According to the principle of "move computing is more efficientlythan move data", in the premise of each job fairly sharing clusters’ resources, byimproving data locality (assigns the task to the input nodes where the data is toreduce the cost of network transmission), to improve the system performance andthroughput. Delay scheduling is one of common methods to improve data localityand clusters performance. However, current delay scheduling algorithms are basedon the fixed waiting time without considering the load balance of cluster.Therefore, the paper proposes a load balance based dynamic delay schedulingmechanism (DDS). DDS exploits the gray prediction technology to predict the futurearrival rates of the idle nodes. Considering the load state of cluster and job progress,DDS assigns each job a rational delay waiting time, avoiding invalid waiting. Taskscheduling fully considers the workload of nodes to balance the loads and avoidslower task execution or even failure caused by overloading, consequently short thejob completion time. Experimental results show that DDS performs better than thetraditional delay scheduling algorithm in terms of job completion time and loadbalance. |