Font Size: a A A

APGAS-Oriented Resource Management And Optimization

Posted on:2015-08-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z J HaoFull Text:PDF
GTID:1108330464955364Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
The asynchronous partitioned global address space programming model (APGAS) is an important innovation of the parallel programming model. Currently, there are sev-eral trends in cluster computing environment:(1) High parallelism. There are tens of thousands, even hundreds of thousands of nodes in cluster computing environment, so the programmer can use abundant parallelism. (2) Hierarchical memory access latency. Most of the nodes in current cluster have NUMA (Non-Uniform Memory Access) fea-tures. Also, there may be coprocessors in nodes. Nodes in cluster communicate with each other through different kinds of interconnects. Such topology makes the memory access operations in a cluster have hierarchical performance. (3) Heterogeneity. There are performance-heterogeneous and architecture-heterogeneous computing resources in current cluster environment. These trends make a big challenge to the current parallel programming model. To address this challenge and significantly improve programmer’s productivity in cluster computing environment, the APGAS model is invented.The APGAS model has attracted extensive attentions in academia. In industry, the APGAS model has also been widely used in IBM. Nowadays, there are a lot of large-scale applications based on APGAS running in IBM. The major drawback of the current APGAS is its poor practicability, especially the ability of resource management and optimization. Hence, current APGAS can not been widely used in industry. Ma-jor problems related to this ability include:(1) the reliability problem of APGAS pro-grams; (2) the resource management problem of APGAS due to the constraint on the data affinity in APGAS; (3) the global load balance problem of APGAS due to per-formance heterogeneity in a cluster; (4) the performance problem of APGAS due to architecture heterogeneity in a cluster. Since APGAS is a new model, the academia and industry focus on the performance, programmability, portability and the application of the APGAS model. There are few or no studies on the above problems mentioned in APGAS.This dissertation takes the X10 language as an example, which is a renowned AP-GAS language. Based on a comprehensive analysis on the requirements of resource management and optimization in APGAS, this dissertation proposes a systematic solu-tion, to enable the ability of resource management and optimization in APGAS, thereby improving the practicability of APGAS. This dissertation focuses on the reliability, the resource management, the global load balance of the APGAS model, and MIC pro-gramming in APGAS.The studies on these problems are the first time in APGAS. The solution we proposed is practical, efficient, and high-performance. Also, it is transpar-ent to APGAS programmers, and has good backward capability to the current APGAS applications. The solution is based on the features of APGAS, thereby having little im-pact on the application’s performance and efficient use of computing resources. Using this solution, programmers do not have additional burdens, and can still write APGAS applications with high productivity.Specifically, the main contribution consists of the following key techniques and systems that solve different problems:·Make a first attempt to efficiently solve the reliability problem of APGAS, by designing and implementing the X10-FT system. To make X10 applications highly reliable, the X10-FT system introduces the classical checkpoint mecha-nism into the X10 system, and combines the features of the APGAS model and some renowned techniques in distributed computing. These techniques include the Paxos protocol, distributed file system, and so on. The single-point-of-failures can rarely happen in X10-FT system. Detailed evaluation illustrates that, by lever-aging the X10-FT system, a lot of kinds of X10 applications can get high fault-tolerant capability. Also, the performance loss due to fault tolerance is acceptable, 20% on average in the evaluation.·Place migration in APGAS to support resource management and optimization, and global load balance in cluster computing. Design and implementation of X10-PM system based on this idea. The design of X10-PM takes a full consideration on the features of APGAS model, and makes the system transparent to the programmers. Under the help of the X10-PM compiler, X10 applications can support migration at run time without bothering the programmers. During compilation of X10 pro-grams, the X10-PM compiler can automatically find out the optimal migration points at run time in the source code through static analysis. Place migrations happening at these points have the least overhead. The states of the program can also be guaranteed to be consistent before and after migrations at these points. Detailed evaluation shows that it only needs 4 seconds to successfully migrate a place between nodes in cluster using current X10-PM system, which effectively support resource management and optimization, and global load balance in AP- GAS programs.·A programming paradigm and the corresponding optimization rules to leverage MIC coprocessors to accelerate APGAS applications. This paradigm takes into full account features of the APGAS model and characteristics of the MIC co-processors, and has native support for the offload mode and the native mode in MIC programming. Additionally, it has good backward compatibility. To use this paradigm, APGAS applications only need trivial modifications. Our evaluation shows that with this paradigm, most of the tested applications can achieve much better performance using MIC computing resources than the original one. On our testbed, when using one MIC card, most of the benchmarks see their performance doubled; some applications can even have a speedup of 3.
Keywords/Search Tags:Asynchronous partitioned address space (APGAS), X10, Reliability, Fault Tolerance, Heterogeneity, Process Migration, Resource Management, Global Load Balance, MIC Programming
PDF Full Text Request
Related items