Font Size: a A A

Exploiting ILP, LLP and TLP in multi-core processors using off-the-critical path reconfigurable hardware

Posted on:2011-01-24Degree:Ph.DType:Dissertation
University:State University of New York at BinghamtonCandidate:Suri, TameeshFull Text:PDF
GTID:1448390002452227Subject:Engineering
Abstract/Summary:
There has been a major shift in the microprocessor industry towards designing simpler CPU cores that have considerable area, complexity and power advantages. These cores are then leveraged in large-scale multicore processors or in SoCs for hand-held devices. The shift in the design paradigm has been fueled by the unsustainable power consumption and diminishing returns on investment for complex high performance cores. In the future, with increased number of transistors integrated on the chip, the number of CPU cores in the future many-core systems are expected to double every year. However, increasing the number of cores in a multi-core processor can only be achieved by reducing the resources available in each core, and hence sacrificing the per-core performance. Having a large number of homogeneous cores may not be effective for all the applications. For instance, threads with high instruction level parallelism will under-perform considerably in the resource-constrained cores. Furthermore, the lower performance of each individual core also results in reduced energy-efficiency of the overall system.;In this dissertation, we propose various microarchitectural designs that can be adapted to improve a single thread's performance or to increase the overall throughput by executing multiple threads. In particular, we integrate Reconfigurable Hardware Unit (RHU) in the resource-constrained cores of a many-core processor. The RHU can be reconfigured to execute the frequently encountered instructions from a thread in order to increase the core's overall execution bandwidth, thus improving its performance. On the other hand, if the core's resources are sufficient for a thread, then the RHU can be configured to executed instructions from a different thread to increase the thread level parallelism. The RHU has low area overhead, and hence has minimal impact on scalability of the number of cores.;Our experiments show that proposed architecture improves the per-core performance by an average of about 23% to 105% using various microarchitectural techniques and RHU-structures across a wide range of applications while incrurring a per-core area overhead of 5% to 12%. Furthermore, the results show that the RHU-based architecture can improve the throughput of a simple core by about 33%, with an area overhead of about 13%. The instructions which execute on the RHU do not consume core's resources, saving energy of critical microarchitectural structures such as the register file and the dynamic scheduler by about 40%. Finally, we use energy-delay product to show that our best case architecture is almost twice as energy-efficient as the base simple-core.
Keywords/Search Tags:Core, RHU, Area
Related items