Font Size: a A A

Congestion Avoidance Batch Point-to-point Communication Parallel Scheduling Method

Posted on:2021-04-05Degree:MasterType:Thesis
Country:ChinaCandidate:J T PengFull Text:PDF
GTID:2428330602997329Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Large-scale numerical simulation plays an important role in major fields such as weapon physics,laser fusion,electromagnetic environment,engineering mechanics,and materials science.It is the "third pillar" of scientific research beyond theory and experiment.In recent years,with the increasing complexity of numerical simulation applications and the continuous increase in parallel scale,application software has shown an increasingly significant data communication bottleneck,which severely restricts its parallel scalability and execution performance.At present,high-performance computing has entered the era of peta-scale computing,and has begun to move towards exa-scale computing.However,compared with the ten thousand-fold increase in floating-point computing,the improvement of data communication performance is far from being proportional,which has increasingly exacerbated the communication bottleneck of large-scale numerical simulation applications.This paper focuses on batch point-to-point communication,which is widely used in parallel science and engineering applications.As described in the literature,peer-to-peer communication accounts for 90%of the total number of communication operations in typical scientific and engineering applications.Since most numerical simulation applications use BSP to achieve parallelization,their communication is often carried out in a batch point-to-point manner.Batch point-to-point messages communication transmit a large number of message streams of different sources and destinations and different lengths at the same time,which will cause significant network congestion on modern communication networks,thereby significantly reducing network performance and affecting application scalability and execution performance.The main contribution of this paper is proposed the parallel scheduling method for batch point-to-point communication,include the quantitative description of the communication congestion model of batch point-to-point communication congestion behavior,the congestion avoidance batch point-to-point communication scheduling algorithm and the parallelization method of batch point-to-point communication scheduling algorithm of congestion avoidance.The main content of the paper are as follows:(1)Proposed a batch point-to-point congestion model with both low simulation overhead and high accuracy.Efficient and fast modeling of batch point-to-point communication performance is of great significance for the communication performance analysis and optimization of large-scale programs and the collaborative design of software and hardware.Existing cycle-accurate models usually require a lot of computing resources and analytical models often simplify modeling and cannot accurately describe congestion.To solve these problems,this paper proposes a model on packet-level simulation,and proves that the model is theoretically equivalent to the flow-fair model,so that it can model the network congestion in batch point-to-point communication with low complexity.Based on this model,this paper designs and implements a network congestion simulator based on dynamic time-step simulation acceleration.Experiments show that the model can accurately and quickly predict batch point-to-point communication performance and characterize its network congestion(2)Proposed a congestion avoidance batch point-to-point communication scheduling method.Communication performance is crucial to the scalability and performance of massively parallel applications.Aiming at the congestion problem caused by communication competition in modern high-performance computers,this paper proposes an application layer communication scheduling method to reduce network contenion from the source of communication.In this method,messages are grouped and time-scheduled,so that batch communication request packets submitted once are injected into the network to achieve the purpose of reducing communication competition and improving performance.Experimental results show that this method can significantly improve batch point-to-point communication performance.(3)Proposed a parallel congestion avoidance batch point-to-point communication scheduling method.The congestion avoidance communication scheduling method requires information such as global communication graph and network topology,and will become a serial bottleneck in multi-core,multi-node parallel computing.Because batch point-to-point communication scheduling is correctly communicated to the destination,and there are no other correctness constraints,a distributed scheduling method using local information can be considered to make full use of the computing power of the parallel machine to reduce the algorithm overhead.This paper explores the parallel scheduling algorithm based on the idea of avoiding NIC congestion and studies its distributed parallel strategy,and analyzes the parallelism of the two communication scheduling methods.
Keywords/Search Tags:Batch Point-to-point, Network Congestion, Communication Scheduling, Network Model, High Performance Computing
PDF Full Text Request
Related items