Font Size: a A A

An I/O-efficient Mapreduce System

Posted on:2014-04-10Degree:MasterType:Thesis
Country:ChinaCandidate:X XieFull Text:PDF
GTID:2268330422962228Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Recently, MapReduce platform has been widely used for large data processing andanalysis. It works well if cluster’s hardware is reasonably configured. However, based onour survey, hardware configurations in medium-sized and small enterprises are notappropriate for big data. This situation is more challenging in HPC system, due to theperformance growing gap between CPU and I/O. In most HPC cluster, the memory is alimited resource and not sufficient for the need of big data processing.Based on this observation, we develop Dumbo, a new MapReduce system, whichaims to achieve perfect performance on existing hardware configuration by globalmemory management and serialized disk access. In order to reach the goal of memorymanagement, we manage all the memory and get all the memory detail ofall operations inHadoop. It forecast when an operation gets the memory can reach a better performanceand decide when to spill disk is best. Furthermore, we make operations of data Hadoopprocesses serialization.We conducted extensive experiment sand compared against the native MapReduceplatform. The experiment showed that the Dumbo can improve the performance by20%interms of total job execution time. Especially, when io can not match well with systemresource in the system, the performance can be improved by1time. The proposed Dumbosystem is promising for I/O-constrained platforms and can have an impact.
Keywords/Search Tags:MapReduce, Hadoop, Dumbo, I/O Optimization
PDF Full Text Request
Related items