Font Size: a A A

Memory Access Optimization Of ATLAS On Loongson 2F

Posted on:2010-09-24Degree:MasterType:Thesis
Country:ChinaCandidate:B SuFull Text:PDF
GTID:2178360302459809Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
As an important Mathematic Standard library, BLAS(basic linear algebra subroutines) mainly solves the basic operations of matrix and vector. Testing of BLAS's performance is a crucial means to evaluate High Performance Computers. ATLAS is a general optimized version of BLAS. In order to achieve high usage of Loongson2F microprocessor, this article implements the optimization of ATLAS on KD-50-I,which is a High Performance Computer based on Loongson2F.Firstly, this article introduces the characteristics of Loongson architecture, especially the structures of pipleline and memory which have great influence on the performance of program. Then the article studies the data structure of ATLAS detailedly. Different data structure leads to different function implementations and optimization.According to characteristics of Loongson architecture and ATLAS's computing features, a set of techniques are proposed to optimize BLAS subprograms. The main objective of these techniques is memory access approach. Hiding the computing time behind of memory access via instruction scheduling, and efficient usage of memory and cache can achieve the purpose of high performance ATLAS optimization.By exploiting Loop Unrolling technique to decrease memory access frequency, applying nonblocking cache mechanism to form memory access pipeline, the performanc of optimized BLAS2 is improved to 30% higher.Preteching,data tiling and copying can enhance time and space locality of program,then reduce cache misses. After the optimization, single-precision functions of BLAS3 can run 80% faster, while the performance of double-precision BLAS3 is improved by more than 50%.Optimization techniques adopted in this article also have important significance in high performance implementation of BLAS in Loongson3.
Keywords/Search Tags:Loongson2F, BLAS, ATLAS, Loop unrolling, instruction scheduling, data prefetching, nonblocking cache, cache miss
PDF Full Text Request
Related items