Font Size: a A A

Research On Memory Simulation And Optimizations In CMPs

Posted on:2008-08-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:X GaoFull Text:PDF
GTID:1118360212998669Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Due to technology push and application pull, the Chip Multiprocessors (CMPs) have become the trend of the high performance microprocessors. With the heavy contention for the limited memory resource on a single chip, the latency of memory access becomes the major bottleneck of the CMP performance. Meanwhile, since the target workloads of CMPs are diversified, traditional performance evaluation and simulation environments face new requirements. The thesis focus is on Multicore Simulation, effective designs of the CMP Cache-Hierarchy and optimizations of the CMP memory controller. The contributions of the thesis include:1. Based on the SimOS full-system environment, a new MultiCore full-system simulator for Godson processors, SimOS-Godson, has been designed and implemented. The SimOS-Godson decouples the function simulation and timing simulation. It adopts a new value-prediction approach to implement memory consistency in the simulation environment. The credibility and accuracy of SimOS-Godson are achieved by cross-validating the simulator with the actual hardware. The simulator inherits the benefits such as high speed and high flexibility from the traditional user-level simulators. It also has the new benefits such as accuracy, full-system support and easy to use. By porting the entire Linux OS, analysis and evaluation of the microarchitecture and workloads can be conducted easily in the SimOS-Godson full-system environment. On a machine of Pentium4 3.0GHz, the speed of SimOS-Godson exceeds 300K instructions per second. SimOS-Godson will play a key role in the research of future Godson MultiCore architecture.2. The thesis presents a dynamic page management strategy for CMP memory controller. The performance evaluation and program behavior analysis in the thesis shows that individual thread access stream in multi-thread applications has good page locality, but the interleaved access contention brought by multi-core will heavily damage the page locality seen by the shared memory controller. A novel history-based predication scheme, which can detect hot-core access, is proposed to provide accurate predication for CMP page managements. With the characteristics that the CMP memory controller will have a larger scheduling window, an out-of-order scheduling policy for memory access is designed to reduce the useless precharge latency. Adopting the two techniques, the dynamic strategy can deliver 8.6% increase in performance for multi-threaded applications.3. Due to the wire delay problem and diversity of applications, neither private nor shared caches can provide both large capacity and fast access in CMPs. We present a novel CMP cache design, Heterogeneous CMP Cache (HCC), in which chips are constructed by tiles of two different categories. Incorporating indirect-index cache technology to share capacity between different hierarchies, HCC provides both capacity-effective and access-fast on-chip memory subsystem. Detailed full-system simulations are used to analyze the HCC performance for various programs, including SPEC CPU2000, SPLASH2 and commercial workloads. The result shows HCC improves performance by 16.2% for single-threaded benchmarks and 9.1% for multi-threaded benchmarks. HCC is easy to implement and the design ideas will be used in the future multi-core processors of godson series.
Keywords/Search Tags:Chip-MultiProcessors, performance simulation, memory access scheduling, page mode predication, cache-hierarchy, heterogeneous, indirect index cache
PDF Full Text Request
Related items