Font Size: a A A

Multicore processor and hardware transactional memory design space evaluation and optimization using multithreaded workload synthesis

Posted on:2011-08-01Degree:Ph.DType:Thesis
University:University of FloridaCandidate:Hughes, Clayton MFull Text:PDF
GTID:2448390002968521Subject:Engineering
Abstract/Summary:
The design and evaluation of microprocessor architectures is a difficult and time-consuming task. Although small, hand-coded microbenchmarks can be used to accelerate performance evaluation, these programs lack the complexity to stress increasingly complex architecture designs. Larger and more complex real-world workloads should be employed to measure the performance of a given design and to evaluate the efficiency of various design alternatives. These applications can take days or weeks if run to completion on a detailed architecture simulator. In the past, researchers have applied machine learning and statistical sampling methods to reduce the average number of instructions required for detailed simulation. Others have proposed statistical simulation and workload synthesis, which can produce programs that emulate the execution characteristics of the application from which they are derived but have a much shorter execution period than the original. However, these existing methods are difficult to apply to multithreaded programs and can result in simplifications that miss the complex interactions between multiple concurrently running threads.;This study focuses on developing new techniques for accurate and effective multithreaded workload synthesis for both lock-based and transactional memory programs. These new benchmarks can significantly accelerate architecture design evaluations of multicore processors. For benchmarks derived from real applications, synchronized statistical flow graphs that incorporate inter-thread synchronization and sharing behavior to capture the complex characteristics and interactions of multiple threads are proposed along with a thread-aware data reference model and a wavelet-based branch model to generate accurate memory access and dynamic branch statistics. Experimental results show that a framework integrated with the aforementioned models can automatically generate synthetic programs that maintain characteristics of original workloads but have significantly reduced runtime.;This work also provides techniques for generating parameterized transactional memory benchmarks based on a statistical representation, decoupled from the underlying transactional model. Using principle component analysis, clustering, and raw transactional performance metrics, it can be shown that TransPlant can generate benchmarks with features that lie outside the boundary occupied by these traditional benchmarks. It is also shown how TransPlant can mimic the behavior of SPLASH-2 and STAMP transactional memory workloads. The program generation methods proposed here will help transactional memory architects select a robust set of programs for quick design evaluations in both the power and performance domains.
Keywords/Search Tags:Transactional memory, Evaluation, Programs, Benchmarks, Workload, Multithreaded, Performance
Related items