Font Size: a A A

MapReduce Job Oriented Collaborative Optimization On Cloud Data Center Network Resource

Posted on:2018-02-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2348330536960919Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As one of the most important calculation models for processing large scales of data,MapReduce has been widely used in the field of information technology,data mining,artificial intelligence,mathematical calculating and so on owing to its advantages on easy programming,good expansibility and high tolerance.The network transmission stage of MapReduce application takes cloud data center a large amount of network bandwidth.A large number of network loadsthat have been produced by the data transmission will not only cause network congestion,but also bring harm to the performance of these applications themselves.It has become an urgent problem to make different applications share network resource,avoid bandwidth competitions or network congestions and reduce the job completion time with the premise of guaranteed service quality.However,the existing study on optimizing the network resource of cloud data centeris not comprehensive.There is a lack of traffic identification system in network,which leads to the ignorance of the properties of applications and the special requirement of network at different levels.Besides,current traffic scheduling methods don't take the dependence between coflows into consideration,which seriously decreases the effectiveness of traffic scheduling in reducing the job completion time.In application layer,existing task placement and scheduling methods can't make a reasonable optimization for real-time network status and node capacity.Accordingly,this thesis,from bottom to top,solved the MapReduce job oriented cloud data center network resource sharing problem.The main works of this thesis are as follows.First,this thesis proposed a flow marking and recognizing mechanism based on OpenFlow protocol.It distinguished data flows of different applications by netfilter technic that couldchange the Tos filed of Ipv4 header.Second,this thesis studied the interdependent coflows scheduling problem in the constraint of job deadline,formulated a deadline and dependency aware job scheduling problem that minimized average job completion time,and proposed a two-level scheduling algorithm.This algorithmfirstly allocated bandwidth for bottleneck time slot and link at job level.Afterwards,this thesis adopted a scheduling algorithm combining priority scheduling with weightedfair scheduling to meet the different dependencies between coflows at intra-job level.Last,this thesis considered the influence of transmission data size,rate,distance and the capacity of node calculating resource on Shuffle transmission,and further proposed an optimizing model to decrease the transmission cost of Shuffle stage as much as possible by placing Reduce tasks more reasonably.Simulation results have shown that our two-level scheduling method can reduce the job completion time by up to 18% and accommodate 21% more jobs with deadlines guaranteed at the same time,compared with the traditional shortest-job-first method.The Min-cost task placement algorithm realized a more reasonable ratio of task placement in comparison with fair algorithm and locality algorithm.Thus,this thesis achieved the goal of optimizing the network resource of MapReduce job oriented cloud data center.
Keywords/Search Tags:MapReduce, Flow Marking, Coflow Scheduling, Task Placement
PDF Full Text Request
Related items