Parallel simulation and multiple-path execution techniques for chip-multiprocessor architectures

Posted on:2002-03-15

Degree:Ph.D

Type:Dissertation

University:University of Florida

Candidate:Chidester, Matthew Charles

Full Text:PDF

GTID:1468390011497394

Subject:Engineering

Abstract/Summary:

Integrating multiple processing elements onto a single integrated circuit to form a chip-multiprocessor (CMP) has been proposed as a solution to the problem of increased wiring delays between elements of a integrated circuit. This dissertation exploits the architecture of a CMP to both reduce the simulation time required to study such chips and increase the performance of applications running on such a device.; The complexity of parallel systems has increased both the need for comprehensive simulation and the computation time required to perform the simulations. CMP architectures are particularly susceptible to this effect, combining the requirements of a microprocessor simulator with that of a parallel system. In the first part of this dissertation, a portable, distributed simulator for CMPs is developed and presented based on the Message Passing Interface (MPI) that is designed to run on a cluster of workstations. Because the simulator itself is a complex application, microbenchmark-based evaluation is used to compare parallelization algorithms and interconnects for use in the parallel simulator while identifying potential bottlenecks. The best combination is shown to yield speedups of up to 16 on a 9-node cluster of dual-CPU workstations.; The tight coupling of processing units in a CMP allows new forms of parallelism to be exploited. The second part of this dissertation studies multiple-path execution (MPE) on a CMP design to provide speedup on unmodified sequential code by exploring different paths of a conditional branch on separate processors. The impact on MPE performance due to processor complexity and count, cache and branch prediction architecture, processor-to-path allocation strategies, and limited interprocessor communication capabilities is explored. Simulation shows 12.7% speedup of instructions per cycle (IPC) on SPECint95 with up to 33.5% on benchmark components with poor branch prediction accuracy. This level of speedup is achievable on an 8-processor, 8-issue CMP with a simple mesh interconnect with realistic latencies and limited bandwidth.

Keywords/Search Tags:

CMP, Parallel, Simulation

Related items

1	Parallel Algorithms And Parallel Implementation Of Meshless Numerical Simulation
2	General Purpose Parallel Discrete Event Simulation Environment And The Study Of Relevant Techniques
3	The Research On Mixed Traffic Parallel Simulation Based On Grid
4	Research On Key Techniques Of Component-based Parallel Simulation Engine
5	Research On The Problems Of The Description And The Simulation Of Mobile Agents
6	Develop Of A Parallel Simulation Scheduler For NS2 Over A Cluster Of Computers
7	Research And Design Of A Parallel Interconnection Network Simulation Platform
8	Research On Large-scale CPU And MIC Heterogeneous Parallel Computing Techniques For Detonation Simulation
9	Network Parallel Simulation Methods Research Based On NS3
10	Cache Optimizations And Parallel Simulation For Multi-threaded Workloads