Font Size: a A A

Control-Flow Decoupling: An Approach for Timely, Non-speculative Branching

Posted on:2014-04-19Degree:Ph.DType:Dissertation
University:North Carolina State UniversityCandidate:Al Sheikh, Rami MohammadFull Text:PDF
GTID:1452390005995530Subject:Engineering
Abstract/Summary:
Mobile and PC/server class processor companies continue to roll out flagship core microarchitectures that are faster than their predecessors. Meanwhile placing more cores on a chip coupled with constant supply voltage puts per-core energy consumption at a premium. Hence, the challenge is to find future microarchitecture optimizations that not only increase performance but also conserve energy. Eliminating branch mispredictions --- which waste both time and energy --- is valuable in this respect.;In this work, we explore the control-flow landscape by characterizing mispredictions in four benchmark suites. We find that a third of mispredictions-per-1K-instructions (MPKI) come from what we call separable branches: branches with large control-dependent regions (not suitable for if-conversion), whose backward slices do not depend on their control-dependent instructions or have only a short dependence. We propose control-flow decoupling (CFD) to eradicate mispredictions of separable branches. The idea is to separate the loop containing the branch into two loops: the first contains only the branch's predicate computation and the second contains the branch and its control-dependent instructions. The first loop communicates branch outcomes to the second loop through an architectural queue. Microarchitecturally, the queue resides in the fetch unit to drive timely, non-speculative fetching or skipping of successive dynamic instances of the control-dependent region.;Either the programmer or compiler can transform a loop for CFD, and we evaluate both. On a microarchitecture configured similar to Intel's Sandy Bridge core, CFD increases performance by up to 55%, and reduces energy consumption by up to 49%. Moreover, for some applications, CFD is a necessary catalyst for future complexity-effective large-window architectures to tolerate memory latency.
Keywords/Search Tags:CFD, Branch, Control-flow
Related items