Font Size: a A A

The Hierarchical Graphs Of Multi-core Cluster Model Oriented Design And Implementation

Posted on:2013-01-29Degree:MasterType:Thesis
Country:ChinaCandidate:Z W XiaoFull Text:PDF
GTID:2248330395950376Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
MapReduce is a parallel programming model proposed by Google, which is simple but expressive, and has become the core infrastructure in Google’s computation framework. Hadoop is an open source implementation of MapReduce. Due to the support of big data companies like Yahoo!. Hadoop grows rapidly and gains a widespread popularity.The elegance of the MapReduce programming model and the readily availability of the Hadoop implementation have opened opportunities to run a variety of big data applications on commodity clusters, which are now usually equipped with multi-core processors. This paper argues that there are multiple levels of data locality and parallelism in typical multicore clusters that affect performance. Unfortunately, there are currently few literatures to characterize and optimize applications’ performance on such platforms.This paper characterizes the performance limitations of typical MapReduce applications on multi-core based Hadoop clusters. The current popular MapReduce implementation Hadoop is a JVM-based runtime. Our study shows that current JVM-based runtime (i.e., TaskWorker) fails to exploit the data locality and task parallelism at the single node level. Based on the study, we propose a hierarchical MapReduce model to extend Hadoop. and integrate a C-based MapReduce implementation for shared memory multicore to Hadoop, called Azwraith. The hierarchical scheme enables MapReduce applications to enjoy the locality and parallelism at both the cluster level and single node level. To reuse data across the job boundary, we extend Azwraith with an effective in-memory cache scheme that significantly reduces networking and disk traffics. Due to the hierarchical MapReduce model, MapReduce applications can exploit data locality and task parallelism on both cluster level and single node level.Performance evaluation on a small-scale (7nodes) cluster shows that. Azwraith. combined with the optimizations, outperforms the original Hadoop implementation from1.4X to3.5X.
Keywords/Search Tags:MapReduce, Performance, Multicore
PDF Full Text Request
Related items