Font Size: a A A

Studies On High Performance Parallel Computing Of GRAPES' Tangent & Adjoint Model

Posted on:2011-11-23Degree:MasterType:Thesis
Country:ChinaCandidate:D S RenFull Text:PDF
GTID:2178330338990131Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Four-dimensional variational assimilation as one of the key technologies of numerical weather prediction's can take the information related in time for observed data into account to improve the quality of init data which determine the effect of forecast. It can be assimilated the different times, different regions, different types of observational data be considered the most effective scheme international in data assimilation currently. But its calculation is very complicated and needs more computations and more time to compute. The four-dimensional variational assimilation system of GRAPES ( Global/Regional Assimilation and Prediction System ) called GRAPES-4DVAR for short which is a new generation of numerical weather prediction system be developed by Chinese independently have the similar feature with a large amount of computations, needing more memory and longer time when running. How to reduce the elapsed time by improving the code efficiency, changing the algorithm, enhancing the parallel scalability is the key and focus of this article. This article mainly focus on how to obtain the performance from optimized code for improving efficiency, how to analysis the impact on program performance by using a different way through the quantitative method, and how to use a mixed parallel mode for increase scalability of parallel computing. The main work is summarized as follows:(1) Adjusted and optimized the GRAPES regional mode code. Focus on the research of enhancing the performance of memory system and the basic components of the processor. Analyzed what the reasons caused pipeline stalled and remove the bottleneck in code which has a significant impact on the performance when running. Through these, nonlinear mode obtained a benefit 25% improved by adjusting and optimizing code.(2) Put forward a limit solution between the Checkpointing strategy and Store-All strategy. Trade an increase of about 30% of the memory cost for 100% performance increased.(3) Put forward a technique that can manage the data blocks in memory supporting both First In First Out and First In Last Out. Nested Multi-Chained Stack be implement satisfy the need of the improved adjoint algorithm excellent.(4) Improved the Input and Output problem of parallel performance. By comparing the gap of maximum iteration the adjoint mode could running and actual demanding, determined which method can obtain the most performance and satisfy the actual need under stationary computation scale and stationary number of processors. Also given the result that using limited memory space replace the reading/writing external storage when the number of processors more than 128, the wall clock time decline up to 70%. (5) Implement the mixed-mode of parallel computation. For the popular structure of modern cluster system, by using thread-level parallelism through OPENMP method in the node and using the message passing through MPI method internal nodes will display an excellent parallel performance and scalability. Conclude the result that the parallel efficiency of mixed parallel mode can be increased 5% to 10% than of the pure MPI mode when dropped below 90%. Last analyzed the advantages and disadvantages of data division statically for threads.
Keywords/Search Tags:GRAPES, Regional, Tangent model, Adjoint mode, Parallel computing
PDF Full Text Request
Related items