Font Size: a A A

GPGPU Accelerated Massive Parallel Design Of Long Wave Radiation Process In GRAPES-GLOBAL Model

Posted on:2013-08-27Degree:MasterType:Thesis
Country:ChinaCandidate:M GuoFull Text:PDF
GTID:2230330374954968Subject:Science of meteorology
Abstract/Summary:PDF Full Text Request
Recent years, with the rapid advance of GPGPU (General purpose graphic processing unit)technology, leveraging the massive parallel processing power of GPGPU to providesuper-computing capacity becomes a new trend. At present, GPGPU has been applied to manyfields of the scientific calculations. GRAPES (Global/regional assimilation and predictionssystem) are the new-generation multi-scale numerical model, which is developed by ChineseAcademy of meteorological science and plays an important role in weather forecasting andresearch. Long wave radiation process is one of the most important physics process inGRAPES_GLOBAL model and a lot of processing time is occupied by it. As a result, it affectsthe whole model’s computing efficiency.Since this process could be partitioned into different tiles within the horizontal plane, anaturally parallel scheme could be carried out. In this paper, first, the advantages of GPUcomputing are introduced. It has hundreds of stream processors within one chip, which enablesit handling thousands of hardware threads simultaneously, and gives much higher theoreticalthroughput–over1TFlops by one chip. GPU also has a whole integration of supporting toolsets, from compiler to libraries, which could facilitate the development. Then, the long waveradiation computing process is described and analyzed. While keeping the high level MPIcommunication the same, a low-level fine-grained parallel architecture is designed to harnessthe computing power of the new hardware. This massive parallel processing implementation isbased on NVIDIA GPGPU and CUDA technology. Other than looping through a big portion ofthe atmosphere columns within conventional CPU-based systems, the new GPU-basedimplementation uses each small core to process a single column. This scheme has three majoradvantages, including: much higher thread concurrence; using bigger band width of GPUmemory; denser computing intensity and better efficiency.In the end, experiments with real-life dataset are performed and the correctness of the newdesign is validated. Moreover, if we just focus on the time of GPGPU’s calculations, theexperiments showed Tesla C1060has an11x speedup compared to a high-end x86CPU.While, Tesla C2050has an13x speedup, which greatly improved the execution speed andforecasting effects. However, if we focus on the whole time including the time of data transfer and GPGPU’s calculation, Tesla C1060and Tesla C2050just have5.9x and6.1x respectively.Obviously, the time of data transfer is the biggest bottleneck of parallel experimentation.Timing on sub-routines and data transfer time are also recorded and compared. Differentpartition configurations are carried out to get the best combination. Also, the asynchronousexecution overlapping of execution and data transfer is used to hide the latency. And theasynchronous execution experiment has the8.97x speedup compared to a high-end x86CPU.The experiment shows GPGPU has good potential to improve numerical weatherforecasting models. With more and more routines are being ported to GPU systems, a muchbetter speedup could be achieved over the whole model.
Keywords/Search Tags:GPGPU, Numerical Weather Forecasting Model, Long-wave radiation
PDF Full Text Request
Related items