Font Size: a A A

Testing Analyzing And Studying Theoretically Of The Algorithm Of Collective Communications

Posted on:2006-03-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2168360152987482Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As the application and development of the high performance computer, Large -scale distributed computation has become an indispensable tool in developing the science and technology of our country. However, the most important problem of the current high performance computer is that we can not achieve the satisfied performance. Especially we can not achieve the expected absolute speed and efficiency when we run the science computing and engineering application software on the high performance computing with hundreds of and thousands of processors.One of the bottleneck, which impact the performance of the parallel programs, is irrationally designing or using the message passing module, which leads to the high frequency of the global communication or the congestion of the network. According to the data analyzing, the cost of the collective communication is more than 80 percent of which of the whole communication. Apparently, there are two methods to improve the communication performance of the application software, one is to reduce the frequency of the communication, the other is to improve the communication performance of the cluster. Thereby, studying of the efficiency of the collective communication is the emphases of this paper.Collective communication includes many kinds of global communication, such as Broadcast, Barrier, Reduce, Scatter, Gather etc. And there are a lot of algorithms for each kind of global communication; some algorithm can fit for different global communications. Every kind of collective communication belongs to one mode of the following three modes: one-to-all, all-to-all and all-to-one. Further more, the implementation of some collective communication depends on the combination of other collective communications. As a result, the representative Reduce is selected as the research object in this paper.We can study and compare the collective communications through theoreticallyanalyzing and experiment. Experiment method mainly includes testing the different algorithms for the same communication on different clusters and comparing the results. And theoretically analyzing is mainly using the parallel computing model to study the time complexity of the algorithms for the collective communication. In order to study the algorithms for the collective communications correctly, an accurate communication model is needed. Collective communications are usually based on point-to-point communication. So we can use point-to-point models to characterize these collective communications. An accurate model can provide necessary evidences for quantitative analysis, which helps to design and analyze for algorithms. In this paper, the LogGP model and the simplified ct+nP model have been chosen to compare the time complexity of the collective communications. These two models both are based on point-to-point communication, so the relation between them must be existed. In this paper, the relationship between the two models is established through the study of them.However, using the two models to characterize the time complexity of the collective communications can not achieve the satisfied result. Especially when characterizing the algorithm for the long message communications, the relative error is great. Sometimes it is over 50 percent. In order to study the error, a new model, Send/Recv model, is defined. The relative error of the new model is much less than which of the above two models. Through studying the new model and analyzing the above two models, the reason has been found. After improving the above two models the relative error of which has been lessened.In short the main work of this paper includes establishing the relationship of LogGP model and α+nβ model through analyzing the collective communication Reduce theoretically and experimentally, and improving the LogGP model and a+nP model when characterizing the time complexity of the collective communications. These are contributed to the studying and designing of the collective communications.
Keywords/Search Tags:Algorithm of collective communication, Reduce, LogGP Model, α+nβ Model, Send/Recv model
PDF Full Text Request
Related items