Font Size: a A A

Research Of Hardware-Based Collective Communication On K-ary N-tree

Posted on:2009-03-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q B SunFull Text:PDF
GTID:1118360278956580Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
Collective communication is a communication pattern which involves a group of processes, and it is used commonly in scientific and engineering computing. According to the data analyzing, the cost of the collective communication accounts for 80% of the cost of whole communication and 60% of the executing time in some large-scale paral-lel programs. Collective Communication has become a performance bottleneck for high performance computing system. The traditional software-based approach is not suit for the performance requirement. How to support collective communication at hardware level becomes an important project.The interconnection network based collective communication (INCC) has been re-searched widely and deeply since being proposed. Most of the existing reserches mainly concern with the implementation of INCC, such as the function extend of the router, the addressing and routing of the packet, and the solving of the deadlock. The INCC per-forms one-to-many and many-to-one operation on the router, which increases the packet conflict and makes the performance of INCC more sensitive to packet conflict than that of unicast. Therefore, to further improving the practical performance of INCC, two is-sues should be considered. One is how to reduce packet conflict, and the other is how to solve the packet conflict that has occurred. Detailed research works have been done on these two issues based on k-ary n-tree. The main contributions are as follows:1. The topology characteristic of k-ary n-tree is researched deeply. An equivalence relation on routers of k-ary n-tree is defined, and the range of routers that are passed through by the same path or the same collective communication tree is formularly described, which lay the foundation for load balance on k-ary n-tree.2. A policy purchasing global load balance of collective communication is proposed. At the upwarding phase of spanning tree building algorithm, this policy uses the collective communication load of each equivalence class as the criteria of the least load parent selecting, which guarantees the balanced distribution of collective communication load and reduces the conflict among collective communication packets.3. The occurrence and spread of multicast congestion is analysised, and the look-ahead adaptive routing algorithm for unicast is proposed. At the upward phase of unicast packet transmission, this algorithm forecasts the multicast con-gestion according to the multicast load of current router and controls the output port selecting of unicast packet by multicast congestion based on the forecast re-sult, which can prevent unicast packets passing through the router where multicast congestion exist and alleviate the degradation of network performance caused by multicast congestion.4. Based on the fact that high performance computer is sensitive to network latency, a multicast scheduling algorithm name MSFS (Maxed Served First Served) is proposed. To minimize the transmitting latency of multicast packet on the router, MSFS assigns priority to a multicast packet according to the number of output ports which have received the packet and the waiting time of the packet at the head of buffering queue. MSFS gives attention to network throughput and packet latency simultaneously. If the unicast packet is regarded as multicast packet with one destination port, MSFS is equal to FCFS unicast scheduling algorithm. Therefore, MSFS can be integrated FCFS smoothly on network where unicast and multicast coexists.5. The reduction-latency models with conflict and non-conflict are made, and the impact of different kinds of packet conflict on the reduction latency is analyzed. It is demonstrated that the residue concentration scheduling algorithm can achieve minimum average latency of multiple reduction operation on the router. The FCFS policy is prefered to solve the conflict among reduction packets to reduce the impact of reduction operation being called later on the operation being called earlier. The reduction packet first served policy is prefered to solve the conflict among unicast packets and reduction packets to reduce the impact of unicast on the reduction latency.The simulator for multistage interconnection networks named MINSimCC is im-plemented based on the discrete event-driven simulation platform OMNeT++. The si-mulation results show that all the proposed load balance polocies and packet scheduling polocies can achieve their own goals effectively.
Keywords/Search Tags:High-performance computing, Interconnection networks, k-ary n-tree, Collective communication, Multicast, Reduction, Load bal-ance, Packet scheduling
PDF Full Text Request
Related items