Font Size: a A A

Analysis And Optimization, Application Performance Scientific Computing Based On The Mapreduce

Posted on:2011-04-03Degree:MasterType:Thesis
Country:ChinaCandidate:S K ZhuFull Text:PDF
GTID:2208360305497518Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Google Inc. proposed MapReduce, a great distributed programming model, making parallel programming much easier than before. It is no longer necessary for programmers spending lots of time on such tough jobs as task scheduling, resource management and fault tolerance. Due to its simplicity and affectivity, the model is currently adopted broadly in the business applications dealing with huge amounts of data. Because MapReduce takes over the jobs to schedule the tasks to the computing nodes, to recover the task from execution error and to balance the load among the entire server cluster, the progress of developing a distributed application is greatly speed up, especially for those that compute with huge amounts of data.Scientific-computing applications, a category of applications with great realistic value, are never ported to a MapReduce framework before. Our work took two applications from SPLASH-2, Water and Radixsort, to be evaluated on two open-sourced MapReduce frameworks, Hadoop and Phoenix, designed respectively for cluster environment and multi-core platform. We further analyzed the performance bottleneck in it, and locate the corresponding design flaws in MapReduce. We specially did a lot of evaluation works in multi-core platform and the cluster environment.From the experiment results, the memory space of multi-core platform in a single node limits the scale of the application. While running on cluster MapReduce framework, scientific-computing applications suffer a bad performance. Although its parallel degree is improved, the overheads led by data transmission and transformation are dominant during the execution. Since lack of support from storage system below, scientific-computing applications slow down dramatically as the input size grows. Also, coding with original MapReduce interfaces is not an easy job, costing extra effort from developers during programming.In this paper, we also provide suggestions in enhancing the MapReduce framework to suite these applications. For MapReduce model, we suggest to provide more types of programming interface, satisfying the requirement of scientific-computing. For avoiding unnecessary data communication, in case of several tasks dealing with same chunk of data, the scheduler should be able to them to the same node. In cluster MapReduce, distributed storage layer need be augmented to support inherently some complex data structures, which are frequently used by scientific-computing applications.
Keywords/Search Tags:MapReduce, Parallel Programming, Scientific Computing
PDF Full Text Request
Related items