Font Size: a A A

Research On The Performance And Scalability Of Data-Parallel Programming Model On Multicore

Posted on:2012-01-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:R ChenFull Text:PDF
GTID:1488303356971249Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
The demand of processing large-scale data emerges and has rapidly increased over past 10 years. The dramatic growth of data is an "information avalanche". According to statistics from IDC Inc., the amount of information stored in digital form in 2006 is at 161 exabytes, and it reaches to 988 exabytes in 2010. There is no doubt that data-intensive applications have become one of the most important application domains.Meanwhile, with the prevalence of multicore and the continuously increasing number of cores on a single machine, large-scale data processing has been shown to be a promising alternative to harness abundant resources in multicore platforms. The most usual way of harness multicore resources is through parallel applications. Unfortunately, it is a nightmare for average programmers to write an efficient parallel application. In additional to functionality, they also have to consider many issues including data distribution, scalability, load balance and fault tolerance. As a result, Gartner Inc. has identified seven grand challenges in IT for the next 25 years, and parallel programming is ranked the second in 2008.Data-parallel programming model is an appealing proposal to face the chal-lenges, by providing easy-to-use abstraction to hide parallelism issues from users. However, current data-parallel programming models designed to program large clusters do not seriously consider several unique features of multicore platform, such as low-latency inter-core communication, contention for shared cache, and pervasive global failures. Furthermore, current data-parallel programming models emphasize on generality while ignoring specialization in some cases, which limits its usage in some application domains.Based on a detailed analysis on the performance and scalability problems of running Map Reduce programming model on multicore platform, this disserta-tion proposes a practical and systematic solution. First, we extend the general MapReduce programming model with "tiling strategy", called Tiled-MapReduce (TMR). Second, we propose three optimizations that aim at improving memory efficiency, data locality and task parallelism. Finally, we demonstrate the powerful support from Tiled-MapReduce for novel applications such as online aggregation and incremental computation.The main contents and contributions of this paper include the following as- pects:·We analyze the performance issues resulted from differences between mul-ticore and cluster platform, and propose Tiled-MapReduce, which applies the "tiling strategy" to increase performance and scalability. The runtime divides a large MapReduce job into a number of independent small sub-jobs, and iteratively processes one sub-job at a time with efficient uses of resources. We also enhance the fault tolerance strategy for multicore plat-form.·Tiled-MapReduce also enables three optimizations otherwise impossible. First, we use dynamic loading and input data buffer reuse to shorten the life cycle and limit the footprint of input and intermediate data. Second, we use a NUCA/NUMA-aware scheduler to fully exploit the memory hierarchy of a multicore system, resulting in better memory and cache locality. Finally, we use software pipeline and work stealing to eliminate the idle time and fully harness the CPU cores.·We implement a prototype of Tiled-MapReduce, namely Ostrich, running on an Intel machine with 16 cores. Experiments on four different types of benchmarks show that Ostrich has much better scalability than Phoenix for all benchmarks. Meanwhile, it also saves up to 85% memory, causes less cache misses and makes more efficient uses of CPU cores, resulting in a speedup ranging from 1.2X to 3.3X.·We design and implement two prototypes to show the powerful support for different application domain from Tiled-MapReduce. First, the online ag-gregation system, namely Oops, allows the users to observe the progress and approximate result of their tasks. Second, the incremental computation system, namely OstrichInc, reuses the task results in sub-job level, which supports both fully incremental computation and partially incremental com-putation.
Keywords/Search Tags:Data-parallel programming model, Multicore, MapReduce, Tiling
PDF Full Text Request
Related items