Font Size: a A A

Design And Implementation Of Indirect Prefetch Algorithm Based On Shenwei GCC Compiler

Posted on:2022-12-13Degree:MasterType:Thesis
Country:ChinaCandidate:L L YuFull Text:PDF
GTID:2518306755460814Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In modern microprocessors,caches are widely used to alleviate the huge speed gap between the processor and memory.The domestic Sunway-1621 multi-core processor is mainly for high-performance computing and mid-to-high-end servers,and its supporting SWGCC(Sunway GNU Compiler Collection)compiler can support a variety of high-level programming languages and has good portability.However,the software data prefetching support of the current SWGCC compiler is not perfect,and lack of optimization strategies to automatically insert prefetching for it.The goal of this paper is to carry out research on software prefetching based on the SWGCC compiler.The main contributions are as follows:1.Design and implement the indirect prefetching optimization process(Pass)based on SWGCC compiler.In the compiler,the depth-first search algorithm is used to find indirect memory references that refer to the circular induction variable,and the prefetch address calculation information is stored in the cross-linked list data storage structure.The forward prefetch distance is calculated as the ratio of the product of the total number of memory references of the indirect prefetch sequence and the system memory bandwidth to the estimated loop body execution time after the prefetch insertion to load data in future loop iterations.2.To avoid errors caused by intermediate loads in indirect prefetch address calculations,two strategies are proposed.First,add address out-of-bounds checking in the software prefetch code to limit the scope of the induction variable to valid values.Second,the loops containing indirect memory accesses are analyzed,and data prefetching is performed for that indirect memory access only if no store for the index memory reference is found.3.In order to avoid inefficient prefetching and invalid prefetching reducing program performance,two cost models are defined to judge whether a given loop satisfies the basic conditions of indirect prefetching.In the loop iteration number model,the ratio of the estimated loop iteration number to the forward prefetch distance is compared with the preset value.If the ratio is smaller than the preset value,the loop iteration scale is small and the memory access delay cannot be hidden.In another overlap model of CPU operations and memory access operations,the ratio of the estimated number of instructions after insertion prefetch to the total number of memory references in the indirect prefetch sequence is compared with the preset value to determine the memory of the loop indirect prefetch sequence.If the ratio is smaller than the preset value of the minimum ratio,it means that the CPU operation of the current cycle is not enough to generate prefetching revenue.The experimental data show that the indirect software prefetching technology for the Sunway-1621 processor architecture can obviously improve the performance of the whole system.Running the indirect prefetch algorithm on the memory-constrained IS,CG,and Graph500(Seq-CSR)benchmarks results in an average performance improvement of 16% on the Sunway-1621 processor.
Keywords/Search Tags:Indirect memory access, Sunway processor, GCC, Software prefetch
PDF Full Text Request
Related items