Font Size: a A A

Research Of In-Memory MapReduce System For Memory Efficiency Optimization

Posted on:2017-05-19Degree:MasterType:Thesis
Country:ChinaCandidate:C PeiFull Text:PDF
GTID:2348330503489859Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Big data processing and analytics have become a very important part of the era of big data. In order to meet the requirement on the high time efficiency of data processing, big data processing system based on memory computing has become a new research hotspot. But comparing with CPU configuration in the existing high performance computing cluster, memory configuration is obviously insufficient. When MapReduce systems run on them to process data intensive applications, unnecessary disk I/O operations will happen and memory efficiency needs to be improved. Hash-based shuffle, particularly large-scale shuffle, can significantly affect job performance through excessive file operations and unreasonable use of memory. Some intermediate data unnecessarily overflow to the disk when memory usage is unevenly distributed or when memory runs out.In this paper, we provide a novel memory efficiency optimization system called MEOS to solve memory inefficiency problems. MEOS focuses on the optimization of hash-based shuffle mechanism, task scheduler and load balancing. Firstly, the causes of performance degradation are analyzed when the number of partitions increases under hash-based shuffle mechanism. And Object Reusing Shuffle mechanism is introduced to surmount the deficiencies and to improve memory usage efficiency. Secondly, we design and implement a task scheduler, which schedules tasks according to the memory usage of tasks and the resource situation of worker nodes. The scheduler, which we called Memory-Aware Task Scheduler, dynamically calculates the optimal task concurrency of each worker node and minimizes expensive disk spilling. Thirdly, we develop algorithms, called Load Balancing Optimizer, based on the adjustments or resets in the preferred locations of tasks, to implement load balancing.Extensive experiments are conducted for comparison with the original Spark platform. Experimental results on representative workloads demonstrate that the proposed approaches can decrease the overall job execution time and improve memory efficiency in the memory-constrained clusters.
Keywords/Search Tags:In-Memory MapReduce, In-Memory Computing, Shuffle, Task Scheduler
PDF Full Text Request
Related items