Font Size: a A A

Software Simultaneous Multithreading Through Compilatio

Posted on:2019-11-12Degree:Ph.DType:Dissertation
University:University of DelawareCandidate:Chen, YuanfangFull Text:PDF
GTID:1478390017487523Subject:Computer Engineering
Abstract/Summary:
With the Dennard Scaling law break for a long time, the computer architecture design progress towards the wider rather than deeper organization. There are three ways to design wider architecture: 1. Putting more cores on the die to utilize thread level parallelism(TLP); 2. Putting more execution ports in the pipeline to utilize instruction level parallelism(ILP); 3. Making vector register wider to utilize data level parallelism(DLP). To speed up a wide spectrum of applications, modern CPU processors usually have all these characteristics at the same time. However, not all applications could make effective use of these characteristics simultaneously. To efficiently use any of these is still a challenging problem in the optimizing compiler research community even though these problems are not new. Processor architect designed simultaneous multithreading (SMT) to alleviate the problem.;Simultaneous multithreading is an essential technique for improving pipeline resource utilization and the overall power efficiency of chips especially when the processor is either wide or comprised of an in-order pipeline. For a wide-issue superscalar processor, there are two kinds of wasted issue slots: vertical waste where all issue slots in a cycle are empty; and horizontal waste where the issue slots in a cycle are partially empty [74]. Simultaneous multithreading, contrary to its other two counterparts: fine- grained multithreading and coarse-grained multithreading, can fill both vertical and horizontal waste, hence enhancing the overall efficiency. From the user applications point of view, there are two ways to improve the speed or the throughput: thread level parallelism (TLP) and instruction level parallelism (ILP). Simultaneous multithreading can exploit both TLP and ILP in the same cycle whereas fine-grained or coarse-grained multithreading can only exploit either TLP or ILP in a single cycle.;Despite all the benefits brought by simultaneous multithreading (SMT), it's adopted by semiconductor chip makers at a slow pace. AMD most recent Zen processor is its first CPU product featuring SMT. The only other well-known chip makers that offer SMT enabled processors are Intel and IBM. The reason for this is that SMT is very complex to implement. Many of the pipeline stages and memory system need hardware logic to have an efficient SMT implementation. For embedded chips, SMT is not even an affordable choice.;To harvest the benefits provided by SMT with incurring significant hardware costs, we propose a Compiler Based SMT implementation framework called CSSMT that achieves comparable performance to hardware-based SMT. With the help of advanced profiling techniques enabled by precise PMU counters in modern CPU, CSSMT can identify those applications that could potentially benefit from SMT and guide our LLVM based compiler to merge the hot spots in respective threads co-running in the same pipeline. CSSMT is orthogonal to the effect of hardware SMT and can bail out when the merging is not profitable based on its cost model derived from profiling data.
Keywords/Search Tags:SMT, Simultaneous multithreading, Level parallelism, TLP, ILP
Related items