Analysis And Optimization, Application Performance Scientific Computing Based On The Mapreduce

Posted on:2011-04-03

Degree:Master

Type:Thesis

Country:China

Candidate:S K Zhu

Full Text:PDF

GTID:2208360305497518

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Google Inc. proposed MapReduce, a great distributed programming model, making parallel programming much easier than before. It is no longer necessary for programmers spending lots of time on such tough jobs as task scheduling, resource management and fault tolerance. Due to its simplicity and affectivity, the model is currently adopted broadly in the business applications dealing with huge amounts of data. Because MapReduce takes over the jobs to schedule the tasks to the computing nodes, to recover the task from execution error and to balance the load among the entire server cluster, the progress of developing a distributed application is greatly speed up, especially for those that compute with huge amounts of data.Scientific-computing applications, a category of applications with great realistic value, are never ported to a MapReduce framework before. Our work took two applications from SPLASH-2, Water and Radixsort, to be evaluated on two open-sourced MapReduce frameworks, Hadoop and Phoenix, designed respectively for cluster environment and multi-core platform. We further analyzed the performance bottleneck in it, and locate the corresponding design flaws in MapReduce. We specially did a lot of evaluation works in multi-core platform and the cluster environment.From the experiment results, the memory space of multi-core platform in a single node limits the scale of the application. While running on cluster MapReduce framework, scientific-computing applications suffer a bad performance. Although its parallel degree is improved, the overheads led by data transmission and transformation are dominant during the execution. Since lack of support from storage system below, scientific-computing applications slow down dramatically as the input size grows. Also, coding with original MapReduce interfaces is not an easy job, costing extra effort from developers during programming.In this paper, we also provide suggestions in enhancing the MapReduce framework to suite these applications. For MapReduce model, we suggest to provide more types of programming interface, satisfying the requirement of scientific-computing. For avoiding unnecessary data communication, in case of several tasks dealing with same chunk of data, the scheduler should be able to them to the same node. In cluster MapReduce, distributed storage layer need be augmented to support inherently some complex data structures, which are frequently used by scientific-computing applications.

Keywords/Search Tags:

MapReduce, Parallel Programming, Scientific Computing

PDF Full Text Request

Related items

1	Researches And Application Of Mapreduce Parallel Programming Model For Cloud Computing
2	Graphical Programming Tools For Parallel Programming In Scientific Computing
3	Research On Parallelization Of Scientific Computing Kernels On Multi-core Platform
4	Research On MapReduce Parallel Programming Model In The Cloud Computing
5	The Design And Implementation Of Parallel Computing Platform Based On MapReduce
6	Research Of Multi-level Parallelism Programming Pattern For Hybrid Parallel Computing Environment
7	The Research And Improvement Of Mapreduce In Scientific Computing
8	A parallel programming approach for scientific applications
9	Research Of Gpu Cluster-Based Mapreduce Programming Model
10	Pattern Of Parallel Programming Research