Studies On High Performance Parallel Computing Of GRAPES' Tangent & Adjoint Model

Posted on:2011-11-23

Degree:Master

Type:Thesis

Country:China

Candidate:D S Ren

Full Text:PDF

GTID:2178330338990131

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Four-dimensional variational assimilation as one of the key technologies of numerical weather prediction's can take the information related in time for observed data into account to improve the quality of init data which determine the effect of forecast. It can be assimilated the different times, different regions, different types of observational data be considered the most effective scheme international in data assimilation currently. But its calculation is very complicated and needs more computations and more time to compute. The four-dimensional variational assimilation system of GRAPES ( Global/Regional Assimilation and Prediction System ) called GRAPES-4DVAR for short which is a new generation of numerical weather prediction system be developed by Chinese independently have the similar feature with a large amount of computations, needing more memory and longer time when running. How to reduce the elapsed time by improving the code efficiency, changing the algorithm, enhancing the parallel scalability is the key and focus of this article. This article mainly focus on how to obtain the performance from optimized code for improving efficiency, how to analysis the impact on program performance by using a different way through the quantitative method, and how to use a mixed parallel mode for increase scalability of parallel computing. The main work is summarized as follows:(1) Adjusted and optimized the GRAPES regional mode code. Focus on the research of enhancing the performance of memory system and the basic components of the processor. Analyzed what the reasons caused pipeline stalled and remove the bottleneck in code which has a significant impact on the performance when running. Through these, nonlinear mode obtained a benefit 25% improved by adjusting and optimizing code.(2) Put forward a limit solution between the Checkpointing strategy and Store-All strategy. Trade an increase of about 30% of the memory cost for 100% performance increased.(3) Put forward a technique that can manage the data blocks in memory supporting both First In First Out and First In Last Out. Nested Multi-Chained Stack be implement satisfy the need of the improved adjoint algorithm excellent.(4) Improved the Input and Output problem of parallel performance. By comparing the gap of maximum iteration the adjoint mode could running and actual demanding, determined which method can obtain the most performance and satisfy the actual need under stationary computation scale and stationary number of processors. Also given the result that using limited memory space replace the reading/writing external storage when the number of processors more than 128, the wall clock time decline up to 70%. (5) Implement the mixed-mode of parallel computation. For the popular structure of modern cluster system, by using thread-level parallelism through OPENMP method in the node and using the message passing through MPI method internal nodes will display an excellent parallel performance and scalability. Conclude the result that the parallel efficiency of mixed parallel mode can be increased 5% to 10% than of the pure MPI mode when dropped below 90%. Last analyzed the advantages and disadvantages of data division statically for threads.

Keywords/Search Tags:

GRAPES, Regional, Tangent model, Adjoint mode, Parallel computing

PDF Full Text Request

Related items

1	Studies On High Performance Parallel Computing Of GRAPES' Tangent & Adjoint Model
2	Research On High Performance Of GRAPES Tangent/Adjoint Model With The MPI/OpenMP
3	The Parallel Computing Of Adjoint Models Of Variational Data Assimilation In Numerical Weather Forecasting
4	Study Of GRAPES Numerical Weather Prediction System Optimization On Domestic High Performance Computers
5	Study And Implementation Of Parallel Computing For Three-Dimension Variational Assimilation
6	The Research On Implement Technique Of Automatic Differentiation Tools
7	Design Of Parallel Optimization Algorithm And Software And Port Of Numerical Software
8	Bands Matrix Parallel Computing Research Based On Heterogeneous Systems
9	The Hybrid MPI+OpenMP Parallel Method Of GRAPES-global Model
10	Study On Inverse Problems In Electromagnetics Based On Genetic Algorithm And Parallel Computing