Font Size: a A A

The Theory And Technology On Coded Distributed Computation

Posted on:2021-05-24Degree:MasterType:Thesis
Country:ChinaCandidate:X YangFull Text:PDF
GTID:2518306476450004Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Distributed computing has became an important solution to deal with the problem of massive data and large scale computing.When the scale of distributed computing system increases,it will inevitably encounter the problem of straggler and communication bottlenecks.This paper provides insights on how coded solutions work in the areas of straggler mitigation and communication optimization.First of all,this paper will introduce the models of coded computation and coded shuffling as well as the coding technologies like fountain codes and MDS codes.Secondly,this paper introduces the coding schemes which can be utilized in the distributed matrix vector multiplication and distributed matrix multiplication to mitigate the effect of stragglers.Then,the corresponding coding schemes for communication optimization are introduced in the context of Map Reduce and random shuffle of distributed machine learning.Finally,the coding scheme considering both the stragglers mitigation and the communation optimization is discussed.In order to alleviate the influence of the stragglers in distributed matrix vector multiplication,the MDS code schemes and a series of fountain code schemes are introduced.For the MDS code scheme,the fast decoding scheme based on Walsh transformation is introduced.Besides,the LT coding with BP decoding and the Raptor coding with inactivation decoding are introduced.What's more,the coding scheme with feedback is proposed,which greatly improves the performance in the areas of decoding overhead and system delay.For the above coding schemes,the delay of Map phase in each scheme is derived theoretically.With the help computer simulation,the storage loads and the computation delay of each coding scheme are analyzed.For the straggler problem in the distributed matrix multiplication,there are five coding schemes introduced in this paper.We introduce the concept of recovery limit as the major index to evaluate the performance of each scheme.There are two common schemes referred to as naive matrix multiplication and blocked matrix multiplication.For the naive matrix multiplication,the product code scheme based on MDS codes,the polynomial coding scheme and the Mat Dot coding scheme are introduced.For the blocked matrix multiplication,the polydot coding scheme and the reduced recovery coding scheme are introduced.The polydot code is construceted on the basis of the polynomial code and the Mat Dot code.It is designed to achieve a trade-off between the communication load and the recovery limit.When the elements of the matrix are nonnegative integers,the reduced recovery coding scheme can be utilized for further reducation on the recovery limit.After that,this paper introduces the application of polydot coding scheme in the training process of distributed DNN network.In order to unify the decoding schemes in forward and backward propagations of DNN network,an interpolation encoding scheme for binary polynomials is proposed.For the communication bottlenecks in the distributed system,several coding schemes and the communication load of each coding scheme are presented in this paper.For the coding scheme under the Map Reduce scenario,the coding strategy for each group of codeable sets is proposed to achieve the least coded messages.Finally,a coding system considering the communication optimization and stragglers is introduced in the context of distributed matrix vector multiplication.The coding system based on the MDS code and repetition code can obtain different communication loads and Map by adjusting parameters.
Keywords/Search Tags:Distributed computation, Straggler mitigation, Coded shuffle, Erasure code
PDF Full Text Request
Related items