Font Size: a A A

Compiler Optimizations For Software-Managed On-Chip Memory

Posted on:2010-12-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:L WangFull Text:PDF
GTID:1118360305973658Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Due to the big gap between processor performance and memory performance, it results in the memory wall problem, namely, the memory system has become the bottleneck. The traditional computer architecture adopts hardware-managed cache to address memory wall problem. However, along with the fast developing of application and technology, cache incurs some problems. In contrast to cache, software-managed on-chip memory has advantages in area, power and access speed, it is widely used in embedded-system, stream processor and graphics processing unit, and it is increasingly used in high performance architecture. It is regarded as an effective solution to memory wall problem. Different with hardware-managed cache, software-managed on-chip memory relies on software to explicitly manage all the data transfers between on-chip memory and off-chip memory, and decide when and where the data will enter into memory. Software-managed memory poses the compiler a significant challenge. How to guarantee the program correctness, utilize the limited on-chip memory space, avoid the memory fragments, fully capture the data reuse, optimize the inter-level communication, so as to minimize the memory bandwidth requirement, exploit the parallelism between computation and memory access, so as to hide the memory access latency, which are crucial for good performance of the programs running on software-managed memory system. This thesis emphasizes on compiler optimizations for software-managed on-chip memory allocation. In summary, this thesis makes the following contributions:(1) Proposes permutation graph coloring algorithm for scratchpad memory allocation. In embedded systems, the on-chip memory is often organized as scratchpad memory (SPM). This thesis researches the algorithms for SPM allocation, for the first time, gives the proof that most of the interference graphs of embedded applications are permutation graphs, enabling the optimal SPM allocation to be obtained in linear time. The theoretical analysis and experiments show that our algorithm improves the SPM utilization than the superperfect graph based SPM allocation algorithm, the best in the literature.(2) Proposes memory coloring based stream register file allocation framework. Stream architecture is high performance architecture for stream applications. It adopts software-managed on-chip memory, named stream register file (SRF), as the nexus. SRF is non-bypassing, thus all streams accessed by the computation must be made available in the SRF before the computation is started. A perfect SRF allocation scheme should avoid introducing extra off-chip memory transfers, efficiently capture the widely existed producer-consumer locality, and exploit the parallelism between computation and memory access. The thesis presents a memory coloring (memory partition plus graph coloring register allocation) based SRF allocation framework. The novelty in our research lies in the integration of reuse exploitation and parallelism exploitation into the classic graph coloring register allocation framework. In addition, according to the application characteristics, this thesis presents some improvements, such as incremental coalescing, register ordering, to the classic framework. The experiments show that the memory coloring based framework could efficiently exploit reuse and parallelism, without introducing spills.(3) Proposes an optimal directed path searching based stream register file allocation algorithm. Memory coloring based SRF allocation framework could efficiently exploit reuse and parallelism. However, the memory coloring technique easily introduces the SRF space waste. Another research emphasis of this thesis is when the interference graph is fixed (after the exploitation of reuse and parallelism through operating on the interference graph), how to minimize the needed SRF space, avoid introducing memory fragments. For the first time, we prove that the interference graphs of most of stream applications are comparability graphs, or can be decomposed into multiple comparability subgraphs, thus the optimal SRF allocation could be achieved in polynomial time. For the first time, we formalize the SRF allocation into optimal directed path searching, give a novel SRF allocation algorithm. The theoretical analysis and experiments show that our algorithm could find the optimal or near optimal SRF allocation, improve the SRF utilization than the widely used First-Fit algorithm, the best in the literature.(4) Proposes a hierarchical graph coloring algorithm for allocation of software- managed multi-level memory hierarchy. In modern high performance architecture, to balance the computation and memory access, optimize the memory bandwidth and latency, software-managed memory hierarchy is increasingly used instead of hardware-managed cache hierarchy. The traditional researches focus on single level memory, with insufficient considerations for global optimizations such as inter-level communication, which tend to be crucial for good performance. This thesis extends the graph coloring algorithm, for the first time, applies it to multi-level memory allocation. By modeling the memory hierarchy as a weighted graph, our method is applicable to any software-managed memory hierarchy organization. We also extend the concept of data interference graph, propose the path merging and path reduction techniques to reduce the inter-level communication. By using data live range extension technique, the parallelism is exploited to hide the memory access latency. All the optimizations are integrated into the graph coloring framework. The experiments show that our algorithm achieves a good performance.
Keywords/Search Tags:software-managed on-chip memory, memory management, scratchpad memory, stream register file, graph coloring, interval coloring, permutation graph, comparability graph
PDF Full Text Request
Related items