Font Size: a A A

Offloading And Optimization Of Collective Communication Operations

Posted on:2021-11-06Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhangFull Text:PDF
GTID:2518306548490994Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Collective communication is widely used in the research and engineering fields of high-performance computing.Research shows that it accounts for a large proportion of the scientific computing time of applications.The collective communication time of some applications can account for more than 70% of the entire communication time.High performance computing(HPC)system performance bottleneck.Although the software-based method for implementing collective communication operations has many advantages,it can no longer meet the increasing performance requirements of high-performance computing applications.The emergence of programmable network hardware makes online computing and online caching technology possible.By applying online computing and caching technology to the interconnected network of heterogeneous parallel intelligent computing,the interconnected network can process data processes during data transmission,reduce network communication load,and improve parallel computing efficiency.Increasing the support of the hardware in the interconnected network for the aggregation communication function and combining the software scheduling on the end to accelerate the aggregation communication has become a hot issue in the current research field.At present,there are no companies or research institutions in China that apply the scheme of accelerating the collective communication operation based on the combination of software and hardware to the actual engineering field.This article explores the engineering implementation of the collective communication operation acceleration scheme based on router offload by means of a simulator,and has mainly achieved the following results: In the x Net Sim Plus simulator developed based on the OMNe T ++ platform,the function of the real MPI library is transplanted into the router,making The router has the function of independently parsing collective communication packets;and on this basis,an efficient collective communication logic tree establishment scheme is proposed.This scheme fits the physical topology to the greatest extent during the establishment process and minimizes the number due to the logic tree.And the communication overhead brought by the physical topology mapping,thereby reducing the time required for the aggregation communication operation.Simulation results show that the router offloading scheme in this paper can accelerate Bcast operation 910%,Reduce operation 790%,Allreduce operation830%,and Gather operation 650%.
Keywords/Search Tags:Interconnection network, Collective communication, Combination of hardware and software, Router
PDF Full Text Request
Related items