Font Size: a A A

Research On Efficient Collective Communication Algorithms Of Interconnection Networks For Multicomputers

Posted on:2007-01-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:G LiuFull Text:PDF
GTID:1118360185951475Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
The development of modern science and life has resulted in increased demand for powerful computing capacity, and the design of parallel systems that have the speed of teraflops and even petaflops requires high performance interconnection networks to interconnect numerous processors. Meanwhile, with the growing of the system size, interprocessor communication overhead more and more become a bottleneck. In large-scale scientific computing and engineering applications, the overhead of collective communication usually takes absolute majority of the entire communication overhead. Thus, the research on interconnection networks and collective communication is crucial to increase the performance of multicomputers so as to improve the execution efficiency of parallel applications.The research works presented in this dissertation are mainly concentrated on improving the performance of collective communication operations on interconnection networks.First of all, this dissertation extensively studied the algorithms for complete exchange on single-port torus. Torus has become more and more popular for interconnecting the processors in parallel and distributed systems due to its topological features, and has been widely adopted by many supercomputers today. Moreover, complete exchange, also known as all-to-all personalized communication, occurs in numerous important applications in parallel computing environment. This dissertation proposes a novel network partitioning scheme on multidimensional single-port torus. By adopting the network partitioning scheme, a near-optimal (in terms of transmission time) indirect algorithm has been presented on multidimensional single-port torus. Compared with other existing algorithms, the algorithm for complete exchange not only has a better scalability, but also prominently improves the communication performance.Secondly, this dissertation improves the algorithm for complete exchange, which has minimum startup cost on single-port 2D and 3D torus, by adopting a bottom-up-send-back approach to scheduling parallel communication. Compared with the original algorithm, the improved algorithm not only achieves minimum startup cost, but also has better communication performance.Moreover, there are few researches that have been done for complete exchange on all-port torus. This dissertation firstly proposes indirect algorithms for complete exchange on all-port 1D ring, 2D and 4D torus, which completely achieve the theoretical lower bounds on message transmission. The new algorithms fully utilize all ports and communication links in all-port torus, and transmit messages along shortest paths. The analytical results show that the proposed algorithms on all-port torus have the best communication performance among all existing algorithms, when the message size is enough long.The dissertation proposes the direct algorithm DCE and indirect algorithm MCCE for complete exchange on clusters connected by Ethernet switched hierarchical network. The two algorithms fully utilized the bandwidth in the bottleneck links and theoretically achieve the lower bounds on message transmission. In addition, the algorithm MCCE significantly reduces message startup cost and synchronization cost so as to further improve the communication performance of complete exchange. Experimental results show that the two proposed algorithms significantly outperform other complete exchange algorithms included in MPICH and LAM/MPI, on Ethernet switched...
Keywords/Search Tags:Interconnection Network, Collective Communication, Torus, All-to-all Personalized Communication, Complete Exchange, Cluster, Wide-sense Nonblocking, Multicast, Multistage Interconnection Network (MIN), Parallel Communication Model
PDF Full Text Request
Related items