Font Size: a A A

Joint Optimization Of Network Computing For Big Data System

Posted on:2022-01-29Degree:MasterType:Thesis
Country:ChinaCandidate:L L QinFull Text:PDF
GTID:2518306725981309Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Datacenters play a vital role in the development of cloud computing.In today's datacenters,cluster computing is widely used in data processing and analysis with its high performance and low computing price.However,existing network-level optimizations do not match the performance requirements of clustered computing applications.For mainstream data parallel frameworks such as Hadoop and Spark,network communication is highly structured.They usually implement a data parallel computing model,in which each set of data streams needs to go through a continuous communication stage before producing the final result.In each communication phase,parallel streams need to exchange data between a group of hosts.Usually such a communication phase can only be completed after all the flows are completed.Coflow is a network abstraction proposed for this all-or-nothing special job applicationlevel semantics.Coflow is a set of related flows in the communication phase.The Coflow is complete when all flows have been completed.This abstraction shortens the gap between application-level semantics and network-level optimization.In order to explore better Coflow-based transmission performance improvement schemes,there are many key issues that need to be studied.The most common problem is Coflow scheduling.The most common goal of Coflow scheduling problems is to minimize the average completion time of Coflow.Aiming at the shortcomings of current related research work,this thesis proposes a Coflow scheduling study for network communication optimization.The goal of this thesis is to minimize Coflow Completion Time(CCT).The main research content is divided into the following two parts:1.The thesis developed a distributed bilateral Coflow scheduling framework Django.Using packet loss rate as a key feature,a prediction model is established to estimate the optimal number of concurrent connections during stream transmission.This thesis chooses Support Vector Machine(SVM)as the machine learning model,and uses the C-SVC model as the multi-class classifier.The results show that the model can predict the connection number with a correct rate of 97%.Then,this thesis proposes a series of bilateral scheduling algorithms,in which the sender and receiver hosts can interact independently and asynchronously.Taking the optimal number of concurrent connections as input,the receiver hosts end opens or closes the connection to the sender hosts to implement Coflow scheduling.Testbed experiments and large-scale simulation experiments on NS-3 show that the algorithm in this thesis can reduce the average CCT and tail CCT by 15% and 40%,respectively.2.The thesis implements an online joint reducer placement and Coflow bandwidth scheduling framework to minimize the average CCT.The core idea is to minimize the completion time of a single Coflow through the reducer placement and the control of the traffic transmission rate,and then schedule all Coflow according to the principle of the shortest remaining time first.Because this is an NP-hard problem,this thesis proposes a 2-approximate algorithm.The realization of testbed and largescale simulation prove that the framework can reduce the average CCT by 64.98%compared with the most advanced technology.
Keywords/Search Tags:Coflow Scheduling, Datacenter, Completion Time, Prediction, Reducer Placement
PDF Full Text Request
Related items