Font Size: a A A

Analysis And Optimization Of Memory Scheduling Algorithm Of Spark Shuffle

Posted on:2017-02-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y Z ChenFull Text:PDF
GTID:2308330482481848Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development and popularity of distributed computing framework, Spark has become a hot research project of the open source community due to its advanced design. For large-scale data computing framework, the design and performance of shuffle directly affects the entire performance and throughput of the system. This paper focus on investigation and optimization of memory allocation among tasks during spark shuffle. Compared with other existing shuffle optimizations, this paper finds a shuffle efficiency bottleneck resulting from unbalanced memory space requirements among tasks. To overcome the weakness of the Fair Allocation algorithm, this paper proposes a Spill-based Self-adaptive memory scheduling algorithm SBSA according to spill history. Typical experiments show that SBSA algorithm can effectively improve memory utilization and the overall system performance. The contributions of this paper are summarized as follows:1) States the main distributed computing framework MapReduce, including the programming model, status and disadvantage. Introduces the design concept of Spark, analyzes its improvements against MapReduce model and compare their advantages and disadvantages.2) Studies the concept, development and existing optimization of spark shuffle. Analyzes the idea of memory scheduling of spark shuffle through reading source codes and points out the disadvantages of Fair Allocation Algorithm.3) Propose a Spill-based Self-adaptive memory scheduling algorithm to overcome the weakness of Fair Allocation algorithm. This algorithm is designed in detail, including calculation of free memory, free memory ratio which can be used by key-task, and upper limit of memory of each task.4) Evaluate the performance of SBSA algorithm, comparing it with First-come First-server algorithm and Fair Allocation algorithm by two different experiments. Experimental results show that SBSA algorithm can greatly improve the efficiency of heterogeneous data-distributed application. From the overall performance perspective, SBSA algorithm can take advantage of free memory, improve resource utilization efficiency, and to a certain extent, ease the shortage of memory.
Keywords/Search Tags:Spark, Shuffle, Spill-based, Memory Scheduling, Spill History
PDF Full Text Request
Related items