Font Size: a A A

Adaptive predication via compiler-microarchitecture cooperation

Posted on:2008-09-03Degree:Ph.DType:Dissertation
University:The University of Texas at AustinCandidate:Kim, HyesoonFull Text:PDF
GTID:1448390005968395Subject:Engineering
Abstract/Summary:
Even after decades of research in branch prediction, branch predictors still remain imperfect, which results in significant performance loss in aggressive processors that support large instruction windows and deep pipelines. Predicated execution can reduce the number of branch mispredictions by eliminating hard-to-predict branches. However, the additional instruction overhead and data dependencies due to predicated execution sometimes offset the performance benefits of having fewer mispredictions. This dissertation presents two cooperative compiler-microarchitecture mechanisms to reduce the branchmisprediction penalty by combining predicated execution and branch prediction.; The first mechanism is a set of new control flow instructions, called wish branches. With wish branches, the compiler generates code that can be executed either as normal branch code or as predicated code. At run-time, the hardware chooses between normal branch code and predicated code based on the run-time branch behavior and the estimated run-time effectiveness of each solution. The results show that wish branches can significantly improve both performance and energy efficiency compared to predication or branch prediction.; To provide the benefit of predicated code to non-predicated Instruction Set Architectures (ISAs) and to increase the benefit of predicated execution beyond the benefit of wish branches, this dissertation also presents and evaluates the Diverge-Merge Processor (DMP) architecture. In the diverge-merge processor, the compiler analyzes the control-flow graphs of the program and marks branches suitable for dynamic predication---called diverge branches---and their corresponding control flow merge points. The hardware not only chooses whether to use branch prediction or predication, but also decides "which" instructions after a branch should be predicated based on run-time branch behavior. This solution significantly reduces the overhead of predicated code and allows a very large set of controlflow graphs to be predicated, neither of which was possible previously because predication was performed statically without any run-time information. This dissertation compares DMP with all other major previously-proposed branch processing paradigms available in the literature in terms of performance, power, energy consumption, and complexity. The results show that DMP is the most energy-efficient and high-performance paradigm for branch handling. Code generation algorithms for the DMP architecture and cost-benefit analysis models of dynamic predication are also evaluated.
Keywords/Search Tags:Branch, Predication, Code, DMP, Predicated, Performance
Related items