Font Size: a A A

Network Coflow Scheduling For Big Data

Posted on:2018-10-03Degree:MasterType:Thesis
Country:ChinaCandidate:J LiuFull Text:PDF
GTID:2428330512498205Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the advent of "Big Data" and the growth of application in cloud,the increasing network traffic has brought severe challenges to flow scheduling in data center network.The current data center network usually uses MapReduce,Spark and other large data processing platforms to do data process and analysis.The Reduce phase of the MapReduce needs to pull the results of all Map tasks in different nodes.When pulling data across nodes,the completion time of the data stream should be optimized.With the accelerated development of cloud computing and mobile Internet,data center is in the stage of rapid development.Further research is needed to solve the problem of network flow scheduling in data center.Minimum Coflow bottleneck scheduling algorithm is usually used for Coflow scheduling problem in data center.During the period of scheduling time for the network Coflow with current minimum bottleneck,the network often has residual bandwidth.One of the key issues in the residual bandwidth allocation is how to decide the order to grab the bandwidth,and decide which Coflow can be used to prioritize the remaining bandwidth first.The most commonly used method is to compute the optimal filling network Coflows with remaining bandwidth in the current residual bandwidth.However,it is not considered that when the network flow completes,the bandwidth of the currently occupied links will be restored,and the bottleneck of the network traffic under the constraints of the remaining bandwidth may not be the real bottleneck in Coflow.The network flow with the minimum completion time cannot make full use of the remaining bandwidth to reduce the bottleneck,and there is a large optimization space for its completion time.In view of the above problems,this paper studies the bandwidth allocation problem,this method can be divided into two stages.In the first stage,the network Coflows are sorted and the optimal network queue is calculated.In the second stage,when allocating bandwidth for each Coflow,it is considerable to leave more bandwidth for other network Coflows while not extending the completion time of this Coflow.Thus it can effectively reduce the average completion time of network Coflows.At the same time,the application in the data center has different time sensitivity,and the application faced to the user usually has higher requirements on the time delay,such as search and recommendation service.Background applications are not sensitive to time,such as offline analysis of business data.The network flows generated by these two types application in data center can be called mix-flow.Aiming at the problem of mix-flow scheduling in data center,this paper presents a replacement scheduling method for mix-flow based on stable matching theory.A network Coflow with no deadline is used to devote its current link bandwidth to the network flow with a cut-off time to ensure that the network flow can be completed before the deadline.It can retain the network flow with no deadline enough bandwidth,and ensures that this part of the bandwidth can be used to reduce the average completion time of the network with no deadline.
Keywords/Search Tags:Data center, Coflow scheduling, Residual bandwidth allocation, Stable matching
PDF Full Text Request
Related items