Font Size: a A A

Compiler techniques for evaluating and extending decoupled architectures (Data prefetching)

Posted on:2001-08-03Degree:Ph.DType:Dissertation
University:University of California, DavisCandidate:Rich, Kevin DonaldFull Text:PDF
GTID:1468390014952826Subject:Computer Science
Abstract/Summary:
Decoupled processing was first developed in 1981 to overcome the problem of increasing memory latencies. In the decoupled processing model, the compiler separates a program into those operations necessary to access memory (the access stream), and those operations necessary to perform the actual computational work of the program (the execute stream). These instruction streams are then executed on separate but cooperating processors, ideally allowing the access stream to slip ahead of the execute stream and issue requests to memory far in advance of the consumption of the data. This “slipping” is made possible because of the extensive use of queues to buffer communication between the processors, and between the processors and memory.; Early studies of decoupled processing using small, hand-compiled benchmarks indicated significant speed-up was possible. However, large benchmarks which are compiler generated are necessary to determine if the performance potential is truly realizable. In this dissertation the development of Daecomp , a compiler for decoupled access/execute architectures, is described, focusing on those techniques developed which are unique to decoupled processing (code partitioning, interprocessor copy operations, mitigating code expansion, the handling of function calls, etc.).; Simulation results obtained using Daecomp-generated code revealed results consistent with published results for the Livermore Loops. Significantly poorer results were obtained using two sets of larger benchmarks underscoring the hazards of relying on small, hand-compiled benchmarks for architectural evaluations. Investigations were performed in order to determine the causes of the poor performance, and branch operations which force the access processor to wait on the execute processor were identified as the primary culprit.; Based on these results, the decoupled-style prefetch architecture (D-SPA) is proposed. D-SPA leverages the strengths and compiler techniques of decoupled processing, while permitting the use of advanced branch prediction and speculative execution techniques which are unavailable in the decoupled model. Despite using the decoupled model as a basis, D-SPA is shown to easily maintain binary compatibility with a typical RISC processor.; The results of investigations into the performance potential of D-SPA using large commercial benchmarks indicate that D-SPA has the potential to effectively prefetch data, significantly reducing the cache miss rate and increasing system performance.
Keywords/Search Tags:Decoupled, D-SPA, Data, Compiler, Techniques, Performance, Memory
Related items