As an important component in computer architecture, main memory has significant impact on system performance and automaticity. Volatile memory such as dynamic random access memory (DRAM) currently plays an important role in computer systems. DRAM has advantages of small access latency, low power, long lifetime, and thus has been employed as main memory. However, as the development of internet and cloud computing, traditional memory sytem is facing great challenges when the era of big data is approaching.In the past 30 years, CPU has achieved a 10000 times performance improvement in clock cycles while representative traditional main meory SRAM and DRAM only achieve 200 times and 9 times in performance, respectively. This leads to a much bigger gap betwen CPU performance and memory performance. Both SRAM and DRAM have high dynamic power, occupying 35%-55% of the system power in data-centric applications. Besides, the limited scalability becomes the obstacle of application in furture computer systems.Emerging non-volatile memory (NVM) has been developed with smaller latency and higher bandwidth, and thus becomes the alternative of volatile memories. As a representative non-volatile memory, phase change random access memory (PRAM) exhibits features of non-volatility, high density and near-zero static power. On the other hand, however, PRAM exhibts asymmetry read/write latency, with write latency being over 10 times of read latency. PRAM is also with endurance limitations. Consequently, combining traditional and emerging memories and designing energy-efficient main memory architecture for big data and cloud computing oriented applications become a hot research topic for both industry and academia.In this paper, we explore solutions for energy-efficient main memory. We target hybrid main memory architecture with both DRAM and PRAM, and focus on the policies of memory management and scheduling, with the objective of building a high-performance and low-power main memory with prolonged endurance.In the targeted hybrid main memory, DRAM and PRAM have unified address. The hybrid architecture combines the advantages of DRAM and PRAM, including DRAM’s small access latency, PRAM’s low static power and fast read, while suffers disadvantages such as large static power and refresh overhead from DRAM, as well as large write latency and limited endurance of PRAM. To tackle these challenges, this work makes the following contributions. First, two coarse-grained task assignment strategies are proposed, including an integer linear programming (ILP) strategy and an offline adaptive space allocation algorithm (offline-ASA). The offline-ASA takes a static task set as input, and sets up threshoulds to allocate memory space for tasks, aiming to put more write operations to DRAM space so that the dynamic energy and memory access latency can be improved by reducing writes to PRAM. In addition, to reduce refresh power of DRAM banks, the proposed memory allocation aims to take full use of active portions of DRAM and turn the remaining into idle state. The simulation result shows that ILP strategy gives the energy saving of 42.3% but loses performance of 31.3%. The offline-ASA algorithm delivers an energy saving of 35.1% at the performance cost of 17.1%. This scheme can be applied to Hadoop YARN as an energy-efficient scheduling solution to reduce the system energy of clouding computing systems.On the basis of offline spapce allocation scheme, this paper proposes an online adaptive space allocation algorithm (online-ASA). The online-ASA provides online dynamic space allocation for periodic tasks, with the objective of reducing system energy consumption and wear at the cost of reasonable performance loss, at the same time, maintaining the schedulability. For example, due to the long write latency of PRAM, a task will have a longer execution time if it is allocated with PRAM. If the execution time exceeds the deadline, the schedulability would be violated. Consequently, the schedulability is first guaranteed, and then tasks are categorized based on the write number and thresholds. Write-densitive tasks are allocated with DRAM to achieve fast execution and, at the same time, reduce the writes to PRAM to save memory energy. DRAM refresh energy is also reduced by dynamic DRAM management. The simulation shows a 27.01% energy reduction at the cost of 13.6% performance loss on average.Page is used as a finer-grained scheduling unit in traditional memory schedulers. Thus page-level scheduling in hybrid main memory becomes a hot topic. This paper proposes a page caching scheduling algorithm for unifiedly addressed hybrid memory, called CLOCK-HM. The traditional CLOCK algorithm is suitable for volatile main memory. It concerns page hit ratios by recording access frequency to each page, but does not consider the read/write features and does not differentiate write/read densitive pages. The CLOCK algorithm is not suitable for hybrid main memory since it does not conider DRAM/PRAM features. CLOCK-HM improves the traditional CLOCK algorithm by considering DRAM and PRAM features to adapt to hybrid main memory architecture. It develops two circular linked lists and several control flags, based on which to achieve memory management for write reduction on PRAM, migration control between DRAM and PRAM, and satisfactory cache hit ratio. The simulation shows that, compared with other hybrid main memory oriented scheduling algorithms such as CLOCk-DWF and LRU-WPAM, CLOCK-HM exhibits significant advantages in write number to PRAM, migration number, main memory energy consumption and timing overhead, while maintaining high hit ratio in cache. Consequently, CLOCK-HM is a better scheduler for hybrid main memory architecture.Emerging non-volatile memory definitely has advantages as future main memory, however, there are few evaluation platforms nowadays, resulting in lack of real measurement and convincing assessment of related researches. A hybrid memory based prototype would be valuable to support real implementation and effective evaluations. In this paper, two hybrid main memory based prototypes are designed and implemented. In the first prototype, PRAM is employed as densitive main memory and a small size of DRAM is used as its cache. This architecture combines PRAM’s high density and DRAM’s low latency, and reduces writes to PRAM for endurance considerations. In the proposed design, PRAM chips are partitioned into groups. To improve the access performance, the chips can be accessed parallelly inside each group, and groups are further pipelined. In the second prototype, magnetic random access memory (MRAM) and DRAM are unifiedly addressed for hybrid design. Forty-eight MRAM chips are designed with parallelized memory access, achieving read performance of 2GB/s. These prototypes provide important parameters for hybrid memory system and can be used as simulator in related researches. |