Font Size: a A A

Analysis And Optimization Of Scalability For Parallel Computing

Posted on:2012-09-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z Y WangFull Text:PDF
GTID:1118330341451713Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Parallel computing is the main approach to improve the performance of computersystem. As the system size increases, the architecture becomes more and more complex,and the issues about programming, memory, energy consumption and reliability, etc aregetting more serious, which limit the scalability of parallel computing to some extent.The concept of scalability is further developed to integrate more factors rather than onlyperformance, which is worth re-examining and studying.This paper focuses on the relationship between energy consumption and scalabilityof parallel computing, and the relationship between reliability and scalability of parallelcomputing, i.e. energy consumption scalability and reliability scalability, which are fullydiscussed in the energy consumption part and reliability part of this paper, respectively.The major contributions of the energy consumption part lie in:1. Proposing energy consumption scalability model (Chapter 2)Metric model is a prerequisite for studying scalability. Based on speedup model,thispaperconsiderstherelationshipbetweenenergyconsumptionandperformance,andthenbuildtheenergy-efficiencyspeedupandtheratioofenergyconsumptiontoperformance models, which measure the efficiency of energy consumption and thematching degree between energy consumption growth and performance improve-ment, respectively. The former is the metric of energy consumption scalability, andthe latter concerns the relationship between energy consumption and performance.Based on the ratio of energy consumption to performance, parallel computing sys-tems are categorized into red scalable system, yellow scalable system and greenscalable system.2. Proposing energy wall theory (Chapter 3)Currently, there have no unified cognitions for the research on"Energy Wall",especially, for its scientific concept, quantification and so on. In this paper, energywall is the quantification of energy consumption scalability of parallel computing.Based on energy-efficiency speedup, this paper proposes the theory of energy walland its proof. After analyzing the relationship between red scalable system, yellowscalable system, green scalable system and energy consumption wall, we proof that there is always energy wall in red scalability system, and there is no energy wall inyellow and green scalable systems.3. Proposingoptimizationtechniquesfornetworkdynamicenergyconsumption(Chap-ter 4)The energy wall theory shows that the network energy consumption is one of thekey factors bringing energy wall, which includes static energy consumption anddynamic energy consumption. This paper optimizes dynamic energy consumptionof network by first proposing the principle of superposition for network dynam-ic energy consumption, and then building optimization model for network dynamicenergy consumption based on task layout. The experiments show that the optimiza-tion can reduce effectively the network dynamic energy consumption, which is thefirst attempt to solve the problem of energy wall.The major contributions of the reliability part lie in:1. Proposing the reliability scalability model (Chapter 5)As the system size increases, reliability decreases gradually, which prevents thelarge-scale parallel systems from working properly. Therefore, system must toler-ate failures by incorporating fault tolerance mechanism to improve their reliabilityand availability. As the benefits of fault-tolerance mechanisms rarely come withoutassociated overhead, such as time and cost, which limit the scalability of parallelcomputingtosomeextent. Wefirstbuildthereliabilityspeedupbyconsideringtimeoverhead of fault tolerance, and then categorize systems according to the relation-ship between reliability and performance. Furthermore, we generalized the abovespeedup to general reliability speedup by incorporating cost overhead of fault tol-erance.2. Proposing reliability wall theory and general reliability wall theory (Chapter 6)Similar to the"Energy Wall","Reliability Wall"also has no unified cognitions atpresent, especially, for its scientific concept, quantification and so on. In this pa-per, according to (general) reliability scalability model, we propose the (general)reliability wall theory, as well as its proof. We analyze the relationship betweenreliability wall and constant/incremental system, and proof that there may be relia-bility wall in incremental system, and there is no reliability wall in constant system. 3. Proposing scalable fault tolerance mechanism (Chapter 7)Based on triple modular redundancy (TMR), this paper proposes a new fault tol-erance mechanism which does not limit reliability scalability, i.e. scalable faulttolerance mechanism. We first analyze the additional overhead of parallel comput-ing with traditional TMR running on Mesh topology network, and then find the rootcause which limits the reliability scalability. Moreover, a scalable triple modularredundancy (STMR) is proposed in this paper to solve the problem. Finally, weverify the reliability scalability of STMR by theoretical analysis and simulation,which means the reliability wall is removed by STMR.
Keywords/Search Tags:scalability, energywall, reliabilitywall, metricmodel, optimizationtechniques
PDF Full Text Request
Related items