Font Size: a A A

Parallel Simulation Of Large Scale Computer Systems

Posted on:2014-01-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:X D ZhuFull Text:PDF
GTID:1228330398472871Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Parallel simulation is a suitable solution for large-scale computer systems. It can improve the speed of a parallel simulator onto the level of ui-processor or single node simulator ideally. However, synchronization between nodes greatly hinders parallel simulators reaching the ideal performance. Inappropriate synchronization probably degrades the performance one or two order of magnitude, and such influence also increases with the scale up of simulation nodes. Synchronous mechanism must ensure the simulation avoid of any causality error in cycle-accurate parallel simulation, having the key lookahead value restricted by target architecture features tightly. But none cycle-accurate parallel simulation can allow causality errors to happen, resulting in an enlarged lookahead value. Although relaxed synchronization improves performance notably, the reduced accuracy it brings becomes another challenge. This dissertation provides a systematic scheme for the architectural parallel simulation of multi-core and datacenter systems based on the analysis of their performance and accuracy issues. The research work focuses on the following aspects:1. The research of cycle-accurate parallel simulation of multi-core systems.For multi-core processor simulation, this work designs a pending barrier synchronization mechanism to keep the cycle-accuracy, and provides a suite of solutions for multi-thread optimization. The new synchronization method depends on ahead set pending barriers to ensure logical processors of receiving zero delay events in time. The optimization solutions include memory-access hash table locks, private storage, local allocation space, and lockless queues for issues about shared memory emulation, communication between host threads and etc. The experimental result shows that the implemented simulator PCASim achieves average8.66X speedup V.S. sequential simulation as using17host threads.2. The research of cycle-accurate parallel simulation of many-core systems.Hierarchical synchronous mechanism is proposed in this work to improve performance without degrading model details for tiled many-core processor simulation. It divides a tiled node in the target processor model into two modules, and uses three kinds of barrier to synchronize them and tiled nodes, exploiting the parallelism inside the simulator while keeping the cycle-accuracy. An analysis proves that with the same lookahead, the performance of the proposal mechanism is higher than Quantum mechanism, but lower than two folds of it. A simulator MCASim controlled by the proposal mechanism was implemented. The experiments show it obtains22.OX speedup as using32host threads, V.S. sequential simulation.3. The research of relaxed synchronization.Through analyzing the factors contributing to causality error in relaxed synchronization, this work points out that a coherent speed of all nodes can lead to a low causality error rate, and proposes a novel mechanism based on wall-clock time (WBSP) that keeps the running speeds of different nodes coherent by synchronizing logical clocks with the wall clock periodically within each lax step. It only causes modest precision loss while achieving a close performance to lax synchronization. And the further analysis about the influence of the real world on WBSP concludes the fitting conditions of the mechanism. WBSP was implemented in MCASim. The experimental results show that it improves the average26.7%performance relative to the adaptive Slack mechanism as using32nodes.4. The research of parallel simulation of datacenters.This work constructed a datacenter parallel simulator by integrating a full-system simulator and a network simulator. The simulator uses WBSP to synchronize nodes efficiently. Full-system simulation can support real systems and industrial benchmark suites, providing high fidelity and flexibility. The network simulator is also divided into partitions and runs in parallel, for avoiding performance bottleneck. The analysis on the features of datacenter simulation points out that the ratio of the lookahead value versus the simulation speed is beneficial for WBSP extending step duration. The experimental results show that the datacenter simulator achieves good performance controlled by WBSP. The improvement relative to the baseline barrier synchronization achieves5.1X as using32nodes. Compared to the recently proposed adaptive barrier mechanism, it also achieves nearly49.8%.
Keywords/Search Tags:Parallel simulation, architectural modeling, multi-core processor, datacenter systems, conservative synchronization, lax synchronization, full-system simulation
PDF Full Text Request
Related items