OPS:Optimized Shuffle Management For Distributed Computing Frameworks

Posted on:2021-04-29

Degree:Master

Type:Thesis

Country:China

Candidate:Z X Wu

Full Text:PDF

GTID:2518306503474144

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

In recent years,distributed computing frameworks,such as Hadoop MapReduce and Spark,are widely used for big data processing.With the explosive growth of the amount of big data,companies tend to store intermediate data of the shuffle stage on disk instead of memory.Therefore,intensive network and disk I/O are both involved in the shuffle stage.Also,to improve parallelism and reduce straggler tasks,both the academic and the industry recommend splitting one computing job into a large number of small tasks.However,our research found that as the number of small tasks increases,the number of disk and network I/O requests would increase quadratically.This further exacerbates the performance overhead of the shuffle stage.To evaluate the performance of distributed computing frameworks in different hardware environments and resource scheduling strategies,we present the Framework Resource Quantification(FRQ)model.the FRQ model can predict the execution time of distributed computing jobs and evaluate the performance of resource scheduling strategies they took.To optimize the overhead of the shuffle stage,we propose OPS�an open-source distributed computing shuffle service based on Spark.OPS provides an independent shuffle service for Spark.By using pre-merge and pre-shuffle strategy,OPS alleviate the I/O overhead in the shuffle stage and efficiently schedule the I/O resource and the computing resource.OPS also proposes a slot-based scheduling algorithm to predict and calculate the optimal scheduling result of the reduce task.Besides,OPS proposes a taint-redo strategy to ensure the fault tolerance of computing jobs.We evaluate the performance of OPS on a 100-node AWS EC2 cluster.Overall,OPS optimizes the shuffle overhead by nearly 50%.In the test case of HiBench,OPS improves end-to-end completion time by nearly 30% on average.

Keywords/Search Tags:

Distributed computing, Shuffle, Performance optimization

PDF Full Text Request

Related items

1	Optimization Research On Shuffle Mechanism For Big Data Processing System
2	Efficient Shuffle Management With SCache For DAG Computing Frameworks
3	Research On Significant Technologies Of Performance Optimization On In-memory Computing Framework
4	Optimization Of Spark Task Scheduler For Shuffle Operators
5	Shuffle Performance Optimization Of Spark Based On RDMA Technology
6	Research On Spark Shuffle Process Performance Optimization
7	The Optimization Of High Performance MapReduce FairScheduler And The Implementation On Simulator Of Huge Scale Cluster
8	Performance Optimization Methods For Shuffle Process Of Spark Platform
9	Research On Performance Optimization Techniques In Fault Tolerant Distributed Systems
10	Research Of Optimization Of Hadoop MapReduce Shuffle Phase