Decoupled Vector-Fetch Architecture with a Scalarizing Compiler

Posted on:2017-07-29

Degree:Ph.D

Type:Thesis

University:University of California, Berkeley

Candidate:Lee, Yunsup

Full Text:PDF

GTID:2478390014499416

Subject:Computer Science

Abstract/Summary:

As we approach the end of conventional technology scaling, computer architects are forced to incorporate specialized and heterogeneous accelerators into general-purpose processors for greater energy efficiency. Among the prominent accelerators that have recently become more popular are data-parallel processing units, such as classic vector units, SIMD units, and graphics processing units (GPUs). Surveying a wide range of data-parallel architectures and their parallel programming models and compilers reveals an opportunity to construct a new data-parallel machine that is highly performant and efficient, yet a favorable compiler target that maintains the same level of programmability as the others.;In this thesis, I present the Hwacha decoupled vector-fetch architecture as the basis of a new data-parallel machine. I reason through the design decisions while describing its programming model, microarchitecture, and LLVM-based scalarizing compiler that efficiently maps OpenCL kernels to the architecture. The Hwacha vector unit is implemented in Chisel as an accelerator attached to a RISC-V Rocket control processor within the open-source Rocket Chip SoC generator. Using complete VLSI implementations of Hwacha, including a cache-coherent memory hierarchy in a commercial 28 nm process and simulated LPDDR3 DRAM modules, I quantify the area, performance, and energy consumption of the Hwacha accelerator. These numbers are then validated against an ARM Mali-T628 MP6 GPU, also built in a 28 nm process, using a set of OpenCL microbenchmarks compiled from the same source code with our custom compiler and ARM's stock OpenCL compiler.

Keywords/Search Tags:

Compiler, Architecture

Related items

1	An optimizing C compiler for a general purpose DSP architecture
2	Decoupled Vector-Fetch Architecture with a Scalarizing Compiler
3	Compiler support for a multimedia system-on-chip architecture
4	Low-Power Techniques For Architecture And Compiler Optimization
5	Research And Implementation Of Key Technologies Of MXXXX DSP Compiler
6	Architectural and compiler issues for tolerating latencies in horizontal architectures
7	A programmable architecture and compiler for microfluidics
8	Research On Implementation And Optimization Of BWDSP100 Compiler
9	Complementary compiler and architecture features for embedded VLIW processors
10	Construction of a highly-optimizing compiler for variable instruction set architecture