Research On Shuffle Technology Of Separation Of Computing And Storage In Big Data System

Posted on:2021-08-23

Degree:Master

Type:Thesis

Country:China

Candidate:Z Y Hu

Full Text:PDF

GTID:2518306104488224

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Shuffle is the bridge connecting the mapper side and the reducer side.The reliability and performance of the shuffle service directly affect the execution efficiency of the application.The existing shuffling mechanism aggregates data in memory,which is prone to generate data spills and cause write amplification.When the reduce task pulls data,it will generate a large number of small,random I/O requests,I/O queue waiting time and disk seek time occupy a large part of the entire disk service time overhead.D-Shuffle is an efficient shuffling service that separates computing and storage to solve the above problems.It sends the data calculated by multiple mapper sides to a distributed shuffle service process that is specifically responsible for shuffling.The shuffling process uses a mixed memory layout of Dynamic Random Access Memory(DRAM)and Nonvolatile Memory(NVM).The key is placed in DRAM,the value is placed in NVM,the data sent from multiple mapper sides are merged,sorted as needed,and finally written to a distributed file system.This process reduces the data spill of the computing node,and allows the reducer side to pull data from multiple mapper sides in fewer seeks when seeking data.At the same time,the distributed file system ensures the reliability of the shuffle data.Considering that shuffle data may be lost under extreme conditions,D-Shuffle designed an interruptible re-compute mechanism,which reduces the overhead of recalculation.D-Shuffle is implemented on Spark.The experimental results show that the performance of D-Shuffle is significantly better than Spark's existing shuffle mechanism.D-Shuffle can avoid write amplification in the mapping phase,reduce recalculation overhead by 37% on average,and improve the end-to-end job performance by 23%-33%.

Keywords/Search Tags:

Big data system, Shuffle, Separation of computing and storage, Non-volatile memory

PDF Full Text Request

Related items

1	The Design And Implementation Of A New Storage System For Distributed Non-Volatile Memory
2	Study On Performance Optimization Of Hybrid Memory System Based On Non-volatile Memory
3	The Design And Implementation Of Low-latency Distributed Key/Value Storage Based On Non-volatile Memory And RDMA
4	Key-value Storage Engine On Non-volatile Memory
5	Study On Critical Techniques Of Storage System Based On Non-Volatile Memory
6	Based On Nand Non-volatile Flash Memory Chips Of Solid-state Storage Technology Application And Performance Improvement Of The Research
7	Research On B~+-Tree Based Efficient Indexing Structure For Non-Volatile Memory
8	Research On Optimization Of Frequent Pattern Mining Algorithm Based On No-volatile Memory
9	Investigation Of The Memory Mode And Physical Mechanism For The Next Generation Semiconducting Non-volatile Memory Devices
10	Exploring Energy Optimization For Non-Volatile Memory