Font Size: a A A

Memory profiling on shared-memory multiprocessors

Posted on:2003-12-20Degree:Ph.DType:Dissertation
University:Stanford UniversityCandidate:Gibson, Jeffrey StevenFull Text:PDF
GTID:1468390011482709Subject:Engineering
Abstract/Summary:
Tuning application memory performance can be difficult on any system but is particularly so on distributed shared-memory (DSM) multiprocessors. This is due to the implicit nature of communication, the unforeseen interactions among the processors, and the long remote memory latencies. Tools, called memory profilers, that allow the user to map memory behavior back to application data structures can be invaluable aids to the programmer. Unfortunately, memory profiling is difficult to implement efficiently since most systems lack the requisite hardware support. This dissertation introduces two techniques for efficient memory profiling, each requiring hardware support on either the processor or the system node controller.; The first technique, called TrapPoint, uses processor support for a trapping cache miss to point out memory bottlenecks. We construct a prototype on the versatile FLASH multiprocessor to study its feasibility. We show that modest processor support can be used to construct a useful memory profiler with acceptable overhead.; The FlashPoint memory profiler uses support on the system node controller to collect similar performance information. The FLASH multiprocessor was designed to allow for instrumentation of the node controller, enabling us to construct a prototype. Since profiling is done in the node controller, FlashPoint has access to more information about the memory traffic, such as cache-coherence events, than a processor-based monitor such as TrapPoint. It is therefore able to collect an extended memory profile.; Although FlashPoint requires more hardware support than TrapPoint, it overcomes many of TrapPoint's shortcomings. The required actions for memory profiling are quite similar to those required for cache coherence, so there are numerous synergies in implementing memory profiling on the same node controller that manages the cache-coherence protocol. Performing memory profiling in the node controller therefore allows a memory profiler to collect more data with lower overhead and higher accuracy than is possible on the processor.; Since memory profiling data can be so valuable and it can be collected with relatively little hardware support, we argue that future DSM multiprocessors should be designed with support for memory profiling. This support is best done in the system node controller, but for implementations where this is infeasible, an acceptable monitor can be implemented with processor support.
Keywords/Search Tags:Memory, Processor, Node controller, Support
Related items