High-level synthesis of distributed architectures for memory-intensive applications

Posted on:2006-01-26

Degree:Ph.D

Type:Thesis

University:Princeton University

Candidate:Huang, Chao

Full Text:PDF

GTID:2458390008975122

Subject:Engineering

Abstract/Summary:

High-level synthesis (HLS) has been a topic of research for a long time. However, it has seen limited adoption in practice. We believe one of the key reasons for this is that the quality of designs synthesized by HLS tools do not favorably compare against manual designs, given the wide range of advanced architectural tricks that experienced designers employ. While the basic concepts in HLS (e.g., scheduling, resource allocation/binding, and state machine extraction techniques) have been well established, there is a need to extend the capabilities of HLS to reduce the quality gap between HLS outputs and manually designed electronic systems, by incorporating novel architectures in the context of application-specific integrated circuits (ASICs).; In the world of ASIC design, a wide variety of application domains, including database management, multimedia processing and scientific computing, are characterized by large volumes of memory data references interleaved with computations. These memory-intensive applications present unique challenges to designers in terms of the choice of memory organization, memory size requirements, bandwidth and access latencies, etc., which can result in poor utilization of computational logic. While several techniques have been developed to optimize memory access, and the logic that implements the computations, separately during HLS, significantly higher-quality designs can result if they are addressed in a synergistic manner.; In this dissertation, we present a suite of electronic design automation (EDA) techniques for HLS of distributed architectures, i.e., architectures that have computing logic and data memory distributed jointly (into several partitions) across the entire chip, with computational tasks and corresponding memory data integrated in each partition based on memory access pattern. These techniques are developed to optimize ASIC designs for memory-intensive applications. Novel architectural templates are proposed, which include homogeneous and heterogeneous distributed logic-memory architectures and computation-unit integrated memories. We also identify behavioral transformations that can facilitate efficient HLS of distributed logic-memory architectures by exposing latent parallelism of the given applications. Our design methodologies are evaluated jointly through a case study, an ASIC implementation for the JPEG still image compression, by using the TSMC 0.13mu m 1.2V eight-layer metal CMOS process in the context of a commercial design flow. The hybrid design demonstrates that distributed ASIC architectures can achieve significant performance improvements and reduced energy-delay product over a conventional monolithic design (a single processing unit, e.g., a controller-datapath pair, communicating with a memory or a memory hierarchy).; Conventional HLS tools are capable of extracting parallelism from behavioral descriptions for monolithic architectures. Our work provides techniques to extend the synthesis frontier to more general architectures that can extract both coarse- and fine-grained parallelism from data access and computations in a synergistic manner. Our design framework selects many possible ways of organizing data and computations, carefully examines the trade-offs ( i.e., communication overheads, synchronization costs, area overheads) in choosing one solution over another, and utilizes conventional HLS techniques for intermediate steps. The proposed methodologies do not require any change to the core HLS algorithms. Hence, we believe that existing HLS flows can be easily adapted to take advantage of our techniques.

Keywords/Search Tags:

HLS, Architectures, Memory, Synthesis, Distributed, Techniques, Applications, ASIC

Related items

1	Studies On The PIM Architectures And Techniques For Scientific Applications
2	Automated Generation of Banked Memory Architectures in the High-Level Synthesis of Multi-Threaded Softwar
3	The Study On Key Techniques For Achieving High Energy-efficiency On Emerging Non-volatile Memory Based Architectures
4	Parallel algorithm synthesis for message-passing distributed-memory architectures
5	Performance analysis and evaluation of dynamic loop scheduling techniques in a competitive runtime environment for distributed memory architectures
6	Compiling shared-memory applications for distributed-memory systems
7	Optically-Connected Memory: Architectures and Experimental Characterizations
8	Design And Implementation Of Distributed Memory Object System Based On RDMA
9	Intelligent Scheduling and Memory Management Techniques for Modern GPU Architectures
10	Synthesis of distributed protocols from scenarios and specifications