Font Size: a A A

Optimizing Network Flow Scheduling For MapReduce Systems With Macroflow Abstraction In Datacenters

Posted on:2019-01-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y Z TangFull Text:PDF
GTID:2348330545476681Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of computer technologies in recent years,many appli-cation softwares generate massive data in use.These data are processed and analyzed by big data parallel computing system such as MapReduce to make scientific decisions.Scheduling policies with the granularity of network flows are traditional methods used by network researchers to optimize the job completion time in MapReduce sys-tems.Later,some scholars proposed scheduling policies based on Coflow abstraction and they prove that Coflow abstraction is better than the granularity of network flows under certain conditions.However,we find researchers have generally placed too much emphasis on the network performance of the system in the past and they ignore the ef-fect of machine slot time on job completion time so their scheduling policies can't get better results.The Macroflow abstraction proposed in this paper can describe Reducer's machine slot time in MapReduce systems.We prove that minimizing machine slot time is equivalent to minimizing the aver-age Macroflow completion time and the latter problem is NP-hard.We propose three heuristic greedy scheduling policies based on Macroflow abstraction.Two of them assume that the network flow sizes can be known in advance through logs and other ways and the third can be used in scenarios where the network flow sizes are unknown.Three policies require only 4 to 8 priority levels to operate normally with priority dis-cretization.This means they meet the absolute priority range supported by commer-cial network devices so they can be deployed in real data centers.System simulations and small-scale testbed results show that scheduling policies based on Macroflow ab-straction can significantly reduce the completion time of both network intensive and computation intensive jobs in MapReduce systems.In fact,users do not care about the details of network performance,they only care about whether the job can be completed as soon as possible.The most important significance of this work is that the Macroflow abstraction has bridged the performance measurement gap between applications and networks.
Keywords/Search Tags:MapReduce System, Macroflow Abstraction, Network Flow Scheduling
PDF Full Text Request
Related items