This thesis focuses on a low overhead technique for collecting low-level execution profiles from programs. These profiles are very useful to processor designers working on future microprocessors and to software designers seeking to exploit the maximum performance of current hardware. As part of this technique, we develop an algorithm for collecting basic blocks into paths using program loop structure and then record the execution frequency of these paths. We then implement that algorithm in both pure software and in a mixed hardware/software environment and perform experiments to verify functionality and evaluate performance. Our results show that a software approach to path profiling is viable and that a dedicated hardware collector and compressor approach improves performance considerably. |