Font Size: a A A

Research On Non-volatile Hybrid Memory Architecture For GPGPUs

Posted on:2019-08-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:G WangFull Text:PDF
GTID:1368330572956674Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet of Things and cloud computing,the information age is rapidly changing to the era of big data.The requirements of the data storage and processing are getting increasingly higher,such as high-performance computing and big data analysis.Many studies consider using graphics processing unit(GPU)for high-performance computing and big data analysis,and most of the energy-efficient supercomputing relies heavily on general purpose graphics processing units(GPGPUs)to scale up their parallel and floating point throughput.The massively parallel execution model of traditional GPU was effective in hiding the long latency of accessing off-chip memory.However,in many GPGPUs applications,memory accesses are often data-dependent and have less spatial locality than traditional graphics applications.The massively parallel execution model alone cannot effectively hide memory latency.As memory-intensive applications become more prevalent in GPGPUs,they present a huge challenge to GPGPUs’ storage systems.In the past few decades,although it has made great achievements in reducing processor energy consumption,with the continuous development of big data and the continuous increase of data volume,the proportion of storage energy consumption is increasing.The entire storage system energy consumption has reached to 40%of total energy consumption.In data-intensive applications,storage system energy consumption can account for 55%of the total system energy consumption.This is because static random access memory(SRAM)has high operating power consumption,and dynamic random access memory(DRAM)has unavoidable refreshing power consumption.At the same time,the traditional storage system has the bottleneck of integrated technology,and the space scalability has also been limited.In recent years,Non-volatile Memories(NVM)that have emerged offer opportunities for change and growth of traditional storage systems.Non-volatile memory is outstanding for boosting system performance and saving storage power due to its non-volatile,highly-integrated,low-power,and good scalability.Due to differences in non-volatile storage materials,different non-volatile memories can be used at various levels of the traditional storage architecture to facilitate optimization and change.However,compared with the traditional volatile memory,the non-volatile memory also has the disadvantages of long writing delay,unbalanced read/write performance,and limited write endurance.Therefore,combined with the conventional volatile memory and non-volatile memory,hybrid storage architecture is an effective way to solve this problem.By designing the appropriate optimization strategy,this hybrid architecture can take full use of the advantages of non-volatile memory and volatile memory,weakening and avoiding the disadvantages of both.This dissertation focuses on the design and optimization strategy of GPGPUs with non-volatile memory based on hybrid storage architecture.The purpose of this paper is to improve system performance,reduce system energy consumption and extend the life of the storage system.This paper proposes a unified addressable hybrid memory architecture composed of DRAM and NVM for GPGPUs.The hybrid architecture has the following features:DRAM has low read and write latency,fast reads and writes,high static power consumption and unavoidable refresh power consumption;NVM has low static power consumption,better scalability,and the read latency is similar to that of DRAM,but write latency is longer and write power is high and unavoidable.The research content of this article is started from these observations.In order to reduce the impact of the high write latency of NVM on system performance,this paper proposes a hybrid memory-aware shared last-level cache(LLC)management strategy at the LLC layer of the GPU architecture.This article takes advantage of the asymmetric read-write latency of mixed media across hybrid memory and the memory coalescing features of GPGPUs to divide cache lines into different types.Since memory operations on NVM have a greater impact on system performance,and cache lines accessed by memory requests with different valid addresses have different probabilities to be accessed again,a static memory management strategy based on hybrid memory is proposed.An accordingly fixed priority is assigned to each cache line,including insertion priority when cache miss and promotion priority when cache hits.Furthermore,we also propose a dynamic strategy to flexibily adjust the priority of cache lines considering the dynamically changing memory access features.To do this,a set of schemes including dynamic cache insertion,cache bypassing,and dynamic cache promotion are developed.Experimental results show that in the context of a hybrid main memory system,the hybrid memory-aware shared LLC management strategy improves performance against the traditional Least Recently Used(LRU)policy by 12.78%on average and up to 27.76%.In order to reduce the impact of higher NVM write power on memory energy consumption,a memory latency divergence-aware memory scheduling strategy based on hybrid memory is designed in memory controller of GPU architecture memory layer.In order to obtain higher bandwidth utilization,modern GPU memory controllers reorder memory requests of different warps.Such out-of-order service request scheduling often results in a warp request being preempted by another warp memory request,leading to memory latency divergence and thus system performance reduction.Hybrid memory architectures have some impact on the GPU’s memory scheduling policy,and warp with more NVM requests can result in longer warp blocking.Therefore,according to different warp requests,this paper divides the access requests into different warp-groups,and then allocates different dispatch priorities of warp-groups according to the memory types requested to access.This paper redesigns the GPU memory controller,including the different warp-group aware scheduling queue and the scheduling strategy of transaction scheduler.In order to reduce the influence of memory latency divergence on the system performance in the concurrent thread group and the impact of hybrid memory on GPU memory scheduling,a hybrid memory and warp-aware memory scheduling strategy for GPGPUs is proposed.The strategy reschedules the access order of memory access requests in the memory controller based on the memory access cache behavior so that all memory access requests of the same warp are responded as soon as possible.Experiments show that for memory-intensive applications,the memory scheduling mechanism proposed in this paper improves the system performance by 15.69%and reduces the power consumption of the memory subsystem by 21.27%.The cache management strategy and memory scheduling strategy for hybrid memory design are able to improve the system performance and reduce the system energy consumption.However,the write tolerance of non-volatile memory limits its application.This paper designs a wear leveling strategy for Phase Change Memory(PCM).In the newly developed memory controller,by evaluating the write number of data,the PCM space is divided into hot and cold regions,based on which certain sub-areas are derived for balance-oritened movement.The hot region is periodically moved in the PCM chip.When a movement of the hot region is triggered,several small areas in the hot region move around at the same time.The experimental comparison shows that the proposed wear leveling algorithm effectively reduces the maximum number of bit flips in PCM by 57.81%compared with Start-Gap,evenly distributes the write operation to the entire PCM address space and effectively extends the system life of the PCM memory by 4 to 5.
Keywords/Search Tags:Non-volatile Memory, GPGPUs, Cache Management, Hybrid Memory Architecture, Memory Scheduling, Wear Leveling
PDF Full Text Request
Related items