Font Size: a A A

Workload Cloning: Emulating Memory Hierarchy Behavior using Stochastic Traces

Posted on:2015-10-08Degree:Ph.DType:Dissertation
University:North Carolina State UniversityCandidate:Balakrishnan, GaneshFull Text:PDF
GTID:1470390020450718Subject:Engineering
Abstract/Summary:
Chip Multi Processor (CMP) is the most prevalent processor design today. The number of cores consolidated per chip has been rapidly growing as transistor density has been increasing in accordance to Moore's law. With growing number of cores, multiple diverse workloads compete for the caches, memory controller (MC) and DRAM resources, collectively referred to as the memory hierarchy. This puts an increasing burden on the limited memory hierarchy resources making it the key bottleneck in CMPs. In turn, this shifts the CMP design focus from the core to the memory hierarchy. In this dissertation, we will investigate two black box cloning techniques that can be used to explore memory hierarchy design space. Although solving the simulation time problem is not the primary focus of cloning, a reduced representation of the original workload invariably reduces simulation time.;The first goal of this work is to solve the proprietary workload problem for the cache hierarchy in CMPs. We propose Workload Emulation using Stochastic Traces (WEST), a highly accurate black box cloning technique for replicating data cache behavior of arbitrary programs. First, we analyze what profiling statistics are necessary and sufficient to capture a workload. We show the importance of capturing temporal behavior over spatial or strided behavior. We found that traditional statistics like Stack Distance used for representing temporal accesses are necessary but insufficient. Second, we generate a clone stochastically that produces statistics identical to the proprietary workload. WEST clones can be used for exploring cache sizes, associativities, write policies, replacement policies, cache hierarchies and co-scheduling, at a significantly reduced simulation time. We use a simple IPC model to control the rate of accesses to the cache hierarchy. Our evaluation over a wide cache design space for single core and dual core CMPs shows that WEST can faithfully capture the behavior of the original workload. In addition to solving the proprietary workload problem, WEST also reduces simulation time by an order of magnitude.;The second goal of this work is to create a cloning framework for studying and exploring MC/DRAM design space. We propose Memory Emulation using Stochastic Traces (MEMST), a black box cloning framework for accurately modeling MC/DRAM behavior. We provide a detailed analysis of statistics that are necessary to model a workload accurately from a MC/DRAM perspective. We found that temporal locality in MC/DRAM is a function of DRAM organization (rows, columns, ranks, banks, etc.) and various DRAM timing parameters (e.g. time to precharge, activate, read row, read column, etc.). Furthermore, MC/DRAM have their own design issues such as DRAM frequency, page policy, scheduling policy, etc that require unique statistics. We will also show how a clone can be generated from these statistics using a novel stochastic method. Finally, we will validate our framework across a wide design space by varying DRAM organization, address mapping, DRAM frequency, page policy, scheduling policy, input bus bandwidth, chipset latency, DRAM die revision, DRAM generation and DRAM refresh policy. Our evaluation of MEMST across a wide design space for single-core, dual-core, quad-core and octa-core CMPs showed that it can accurately mimic row buffer miss ratio, transaction latency, memory bandwidth and DRAM power of multi-program and parallel workloads. Similar to WEST, MEMST clones can also significantly reduce simulation time. (Abstract shortened by UMI.).
Keywords/Search Tags:Workload, Memory hierarchy, Simulation time, Using stochastic, WEST, DRAM, Behavior, Cloning
Related items