Font Size: a A A

Improving performance in Hadoop MapReduce

Posted on:2015-09-24Degree:M.SType:Thesis
University:Oklahoma State UniversityCandidate:Aina, Ademola ChukwudiFull Text:PDF
GTID:2478390017997691Subject:Computer Science
Abstract/Summary:
Hadoop MapReduce is a parallel, distributed programming model for processing large data sets or so-called Big data, on a cluster. The basic idea of MapReduce is to split the large input data set into many small pieces and assign these pieces to different devices for processing [5]. In this thesis, we took a look at performance evaluation of the MapReduce framework. MapReduce can be improved to perform speculative execution with maximum performance. Thus, optimizing the cost of computation and cost of communication will help achieve better performance. These optimizations are done by measuring the processing power of each machine and distributing task based on the capacity of each machine. The second step, measure he communication overheads and distribute tasks in the system for a given job or workload. To this end, we represent the Hadoop MapReduce execution with a functional model, and develop an optimization model for performance improvement in the system. Our experiments show that the proposed developed optimization functional model outperforms the regular functional model of the Hadoop MapReduce system by a factor of 2.
Keywords/Search Tags:Mapreduce, Hadoop, Model, Performance
Related items