Font Size: a A A

Research On Optimization Of Communication Performance For Coded MapReduce

Posted on:2022-10-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y M DongFull Text:PDF
GTID:2518306725492984Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
MapReduce is a commonly used large-scale distributed computing paradigm.In its Shuffle phase,computing nodes need to exchange a large number of intermediate values with each other,which leads to a high communication load,and as the scale of the cluster expands,the communication bottleneck problem will become more serious.In recent years,researchers have proposed a coded MapReduce framework,which can greatly reduce the communication load by introducing computational redundancy,and using computing nodes to perform XOR coding on intermediate values and then multicast.However,the existing coded MapReduce framework is mainly optimized for scenarios where the byte sizes of intermediate values are equal and the network bandwidth is homogeneous.When it is used in an application scenario with a heterogeneous byte size of intermediate values or a network environment with a heterogeneous bandwidth,its performance will be severely affected.In response to the above problems,this article focuses on the heterogeneous intermediate value sizes and the heterogeneous network bandwidth,and conducts an indepth study on the communication performance optimization under the coded MapReduce framework.The main contributions are as follows:1)Aiming at the optimization problem of communication load for coded MapReduce under heterogeneous intermediate value sizes,a generalized coded MapReduce framework is first proposed,in which the allocation of Reduce functions plays an important role.And under the premise of fixed computation load,it is based on the allocation of Reduce functions and the size of intermediate values describe the communication load theoretically.On this basis,a combinatorial optimization problem based on Reduce function allocation is proposed to minimize the communication load.This paper proves that the optimization problem is NP-hard,and then proposes a greedy algorithm with low time complexity and an approximation rate of 2.The experimental results based on the Alibaba Cloud platform and the simulation experimental results both show that the proposed coded MapReduce with the intermediate values size aware can significantly reduce the communication load and shorten the total execution time.2)On the basis of heterogeneous intermediate value sizes,and further for heterogeneous bandwidth scenarios,based on the generalized coded MapReduce framework,this paper studies how to perform Reduce function allocation based on the heterogeneous computing node bandwidth to shorten job execution time.For a fixed computation load,this paper describes the communication time according to the Reduce function allocation,the intermediate value sizes and bandwidth of nodes.Through analysis,the theoretical property of minimizing the communication time is obtained,and on this basis,the corresponding combination optimization problem is proposed,aiming to appropriately the Reduce function is allocated to minimize communication time.After proved that the optimization problem is NPhard,an effective greedy algorithm is proposed.Finally,the experimental results based on the Alibaba Cloud platform and the simulation results show that the proposed coded MapReduce framework can significantly reduce communication time and shorten the total execution time.
Keywords/Search Tags:MapReduce, Coded Computing, Heterogeneous System, Communication Load
PDF Full Text Request
Related items