On-chip memory is widely employed in commercial computing devices to reduce the performance gap between processors and off-chip memory. On-chip cache is widely and generally used in general computing systems, including high performance mobile phones, tablet computers, notebooks, desktop computers and servers. Alternatively, Scratch pad memory (SPM), a software-managed on-chip memory, is commonly applied in embedded systems for energy efficiency and area cost effectiveness. Traditionally, on-chip memory is fabricated in pure SRAM. However, as the number of CMOS transistors increases, leakage power consumption becomes a critical issue. In addition, the large cell surface size of SRAM also places a critical limitation on the scalability of SRAM.Non-volatile memories (NVMs), with the features of low leakage power and high storage density, present a new way of addressing the memory leakage power consumption problem as well as the scalability problem. Furthermore, as the NVM technologies advance, there are some new NVM technologies, including STT-RAM and phase change memory (PCM), can provide access speed comparable to SRAM. Considering these advantages of NVMs, researchers recently propose to employ NVMs to build on-chip memory. Besides, write operations on NVMs suffer from considerably higher energy and longer latency. Therefore, the application of NVMs for on-chip memory need to solve the problem that taking advantages of NVM while mitigating its costly write operations.This paper states that compilation techniques can be exploited to help making good use of NVM’s low leakage power and high storage density, while mitigating NVM’s costly write operations. Specially, this thesis consists of three topics:1) present a compilation based approach to improve the performance of NVM based hybrid SPM;2) present a compiler-assisted approach to improve the performance of NVM based hybrid cache;3) present a compilation based approach to improve the performance of volatile NVM based cache.For the first topic, a data allocation method based on graph coloring model is proposed for NVM based hybrid SPM. Each read or write access consumes energy as well as clock cycles. Compared to NVM, SRAM often performs better for write accesses, while worse for read accesses. Therefore, a good allocation strategy should assign SPM space to data objects according to data objects’ read and write frequency. In addition, data objects with disjoint life ranges can share the storage space. If this principle is well exploited, there is more opportunity to allocate more data objects to their favorite SPM space. Motivated by these two observations, this thesis proposes an iterative graph-coloring (IGC) approach to allocate data objects to their favorite SPM space to improve the performance of hybrid SPM.For the second topic, a compiler-assisted migration-aware method is proposed for NVM based hybrid cache. Migration schemes are commonly employed in hybrid cache to dynamically move write-intensive data block from NVM part to SRAM part, and thus a great number of costly write operations on NVM can be eliminated. However, hardware enabled migration schemes rely on naive predication to move potential write-intensive data blocks from NVM to SRAM. This predication based scheme may trigger frequent cache migration operations, and thus degrade the system performance. This thesis proposes a compiler assisted method to identify the migration-intensive blocks, pre-fetch them from main memory to SRAM cache blocks, and disable migration schemes for these cache blocks. As a result, while SRAM’s advantage in terms of write performance is highly exploited, a significant number of migration operations from these data blocks can be eliminated.For the third topic, a compilation-based refresh-aware method is proposed for volatile NVM based cache. Refresh schemes are indispensable for volatile cache. However, frequent refresh operations bring about significant overhead, which can be affected by the program data layout. This is because that, when the program writes a data objects, the targeted cache block can be wholly implicitly refreshed. Based on this observation, this thesis proposes a data assignmentmethod to reasonably distribute data writes into cache blocks, with the purpose of minimizing refresh operations. |