Font Size: a A A

Constructions Of Cascaded Coded Distributed Computing Schemes Based On Placement Delivery Arrays

Posted on:2022-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:L X QuFull Text:PDF
GTID:2518306485486204Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As the scale of data analysis and processing tasks increases,the need to speed up the computing process increases dramatically.Distributed computing is a computing method relative to centralized computing,which distributes computing tasks from centralized computing on one device to distributed computing on multiple devices in the network,thus speeding up the computing process.It can handle large-scale data analysis tasks,such as machine learning,neural network learning and so on.Map Reduce,a computing framework commonly used in distributed computing systems,has been widely used to solve many large-scale data processing problems.In the Map Reduce framework,the overall computation is divided into three phases: the Map phase,the Shuffle phase,and the Reduce phase.(1)In the Map phase,the distributed computing node processes part of the input data locally,calculates according to the designed Map function,and generates some intermediate values.(2)In the Shuffle phase,the nodes exchange the calculated intermediate values with each other to meet the needs of the next phase of calculation.(3)In the Reduce phase,nodes use the intermediate values locally calculated in the Map phase and the intermediate values transmitted by other nodes in the Shuffle phase,and use their respective designed Reduce functions to distributedly compute the desired output values.However,during the whole computing process,data transmission between nodes in the Shuffle phase will generate communication load,which limits the application performance of distributed computing.It is well known that encoding can cope with channel uncertainty in remote communication and can also be used to reduce storage costs in distributed storage systems and cache networks.Therefore,the application of coding in distributed computing can greatly reduce the communication load generated in the process of data transmission,thus speeding up the entire computing process,that is,coded distributed computing.In addition,in the actual network environment,not all nodes can normally complete the computing task,such nodes are straggling nodes.Therefore,in addition to the cost of data transmission,another bottleneck in the execution of coded distributed computing is the straggling nodes.Li et al.introduced a coded distributed computing scheme to reduce the communication load of general distributed computing frameworks such as Map Reduce.They also propose a cascaded coded distributed computing scheme that performs multiple computations on each output function,and prove that the scheme realizes the tradeoff between the computation load and the communication load.However,when the number of computing nodes becomes large,these schemes require an exponential number of input files and output functions,making them difficult to apply in practice.This is the problem we need to solve.In this paper,we construct a class of cascaded coded distributed computing scheme by using the structure of placement delivery array,and greatly reduce the number of input files and output functions.Specifically,given a known placement delivery array,a class of cascaded coded distributed computing schemes can be obtained.We present three new schemes and analyze their performance.It is proved that the number of output functions of all the new schemes is only a factor of the number of computing nodes,and the number of input files of the new scheme is less than the number of input files of the cascaded coded distributed computing scheme derived by Li et al.At the same time,on the basis of the cascaded coded distributed computing scheme of proposed by Li et al.,we consider the case involving the straggling nodes.That is,cascaded coded distributed computing with straggling nodes,in which we perform multiple computations on each output function.By changing the number of segments of splitting the intermediate values,we construct a cascaded coded distributed computing scheme with straggling nodes.However,in most known coded distributed computing schemes for dealing with straggling nodes,each output function is calculated once.At the same time,most of the known coded distributed computing schemes that split the intermediate values have the same number of segments in the intermediate values.Therefore,in our scheme to deal with the straggling nodes,we calculate each output function several times and set different number of segments of the intermediate value.We also perform performance analysis on the scheme,and prove that the communication load generated by the cascaded coded distributed computing scheme with straggling nodes in this paper is slightly larger than the optimal communication load in the coded distributed computing scheme proposed by Li et al.,and in most cases,the ratio of the two is less than 2.
Keywords/Search Tags:coded distributed computing, MapReduce, placement delivery array, straggling nodes
PDF Full Text Request
Related items