Research Of In-Memory MapReduce System For Memory Efficiency Optimization

Posted on:2017-05-19

Degree:Master

Type:Thesis

Country:China

Candidate:C Pei

Full Text:PDF

GTID:2348330503489859

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

Big data processing and analytics have become a very important part of the era of big data. In order to meet the requirement on the high time efficiency of data processing, big data processing system based on memory computing has become a new research hotspot. But comparing with CPU configuration in the existing high performance computing cluster, memory configuration is obviously insufficient. When MapReduce systems run on them to process data intensive applications, unnecessary disk I/O operations will happen and memory efficiency needs to be improved. Hash-based shuffle, particularly large-scale shuffle, can significantly affect job performance through excessive file operations and unreasonable use of memory. Some intermediate data unnecessarily overflow to the disk when memory usage is unevenly distributed or when memory runs out.In this paper, we provide a novel memory efficiency optimization system called MEOS to solve memory inefficiency problems. MEOS focuses on the optimization of hash-based shuffle mechanism, task scheduler and load balancing. Firstly, the causes of performance degradation are analyzed when the number of partitions increases under hash-based shuffle mechanism. And Object Reusing Shuffle mechanism is introduced to surmount the deficiencies and to improve memory usage efficiency. Secondly, we design and implement a task scheduler, which schedules tasks according to the memory usage of tasks and the resource situation of worker nodes. The scheduler, which we called Memory-Aware Task Scheduler, dynamically calculates the optimal task concurrency of each worker node and minimizes expensive disk spilling. Thirdly, we develop algorithms, called Load Balancing Optimizer, based on the adjustments or resets in the preferred locations of tasks, to implement load balancing.Extensive experiments are conducted for comparison with the original Spark platform. Experimental results on representative workloads demonstrate that the proposed approaches can decrease the overall job execution time and improve memory efficiency in the memory-constrained clusters.

Keywords/Search Tags:

In-Memory MapReduce, In-Memory Computing, Shuffle, Task Scheduler

PDF Full Text Request

Related items

1	Research Of Memory Pressure In Shared Executors For Distributed Data Processing Systems
2	Optimization Of Spark Task Scheduler For Shuffle Operators
3	Research On Key Technologies Of Memory Architecture For In-memory Computing
4	Research On Significant Technologies Of Performance Optimization On In-memory Computing Framework
5	Task Scheduling And Shuffle Scheduling For MapReduce Jobs
6	Design Of Mapreduce Task Scheduling Algorithms In Heterogeneous Hadoop Cluster
7	Design And Implementation Of DDR3 Memory Controller
8	Energy-Efficient Management For Memory System Based On Access Patterns Of Applications
9	Analysis And Optimization Of Memory Scheduling Algorithm Of Spark Shuffle
10	Analysis And Visualization Of Memory Access Characteristics In Heterogeneous Memory