Font Size: a A A

Research On High Level Synthesis Of IP Core For Specific Applications

Posted on:2009-12-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y Z DongFull Text:PDF
GTID:1118360278956541Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years, with the rapid development of IC (integrate circuit) design, the technology of system-on-chip (SoC) has been widely used and increasingly involoved in many fields of electronic technology. In fact, SoC has become a trend of current VLSI (very large scale integration) design.The IP (intellectual property) core is the basis and kernel of SoC design. Designers of SoC try to reuse existing IP cores as much as possible to finish the whole project simply by getting them together. These IP cores oriented at special applications embody the innovation of SoC and are also a key factor to the design speed. The HLS (high level synthesis) of IP core raise the level of design from transforming behavior-level description to structure-level, even layout description. HLS can help the designers be released from the complicated hardware design and focus on the high level system design which increases the efficiency and validity of SoC design, and reduces the cost at the same time. As a result, this technology has got much recognition from academe and industry, since it is brought forward and will be promising in the future.Of particular interests to this paper are sliding-window applications, which is widely used in signal, image and video processing and requires much computation and data manipulation. Many HLS systems start with this kind of application because of its particularity of memory accessing. Unfortunately, there are still various limitations of current works. Some of them do not put forward the memory architecture definitely, some do not realize data reuse adequately, some use large numbers of memory elements and registers, and some of them do not discuss the problem of design space exploration. We deeply study some key problems in HLS of IP core for sliding-window operations in this thesis which is outlined as followed.Aiming at the inherent characteristics of sliding-window operations and the limitation of current works, we propose a parameterized memory architecture to generate the hardware frames for all sliding-window applications automatically. The object of our work is to realize data reuse as fully as possible, so as to reduce the number of memory accesses and speedup the execution. A three levels memory structure is designed to realize inner-loop and outer-loop data reuse, and at the same time shifted registers are used to make hardware design simpler. The architecture is decided by some parameters, the values of which are achieved from the compiler. We proposed the parameters's generation algorithm according to different kinds of data reuse. Compared to related works, our approach which uses only a small number of memory elememts and registers can reduce the execution clock cycles by 2.13X and up to 3.8X, and enhance the frequency from 69MHz to more than 200MHz.Based on the parameterized memory architecture, we study the generation of RTL level hardware description, the aim of which is to generate Verilog code of IP core automatically. There are three parts of work: automatic generation of controllers, automatic generation of pipelined operations and generation of holistic encapsulation module. Firstly, the compiler partitions the source codes into two parts: control cell and operation cell. The control cell is analyzed in the compiler, then the value of some parameters are obtained, including the information of loop (the initial value, end value and step-length value of the loop) and the information of data reuse. A algorithm of controllers' generation is presented in this paper, and the controllers can be generated automatically according to these parameters. The operation cell is disposed in the compiler via a series of steps: defining data structure, analyzing dependency, then the description of data dependence flow is created. Based on it, we partition the datapath into pipelined stages, and express the source program in a new IR (intermediate representation). And then, the pipelined operations are generated. Finally, the holistic encapsulation module integrates the controller module, operation module and Ram module etc, and realize the RTL level hardware description's generation. Our approach can avoid the complexity and inefficient of handiwork, and the result is comparatively better.Then, this paper studies the design space exploration technology further according to the sufficiency of resources on chip. We present a design space exploration approach when the resources on-chip is abundant, the aim of which is to use the resources completely, increase parallelism, and reduce the clock cycles of execution. By finding three upper bounds according to area constraints (which is scaled by the number of logic operation units), memory bandwidth constraints and on-chip memory constraints, the block structure of the design, which can fully utilized the available resources on the board is determined. Loop unrolling is adopted as much as possible when the area on-chip is abundant. The input data array is partitioned into several pieces horizontally once the memory elements are insufficient. And the data in a piece is processed in pipeline in order to reduce the number of memory accesses as many as possible. Experiments show that the efficiency of memory using can increase to 85% and compared to current work, the number of memory accesses can reduce by 2% to 20%.There are some large applications which consist of many loop nests. Map these loop nests in an application onto a target chip maybe impractical because of the area limitation on-chip. Traditional method of designing special IP core for every loop nest is awkward. This paper presents a pipelined template, which is universal to all loop nests in an application. These loop nests can be executed on the template orderly. We decide the number of FUs (function units) according to the resources on-chip and the character of specific application. Based on the iterative modulo scheduling of software pipelinging and the ShiftQ architecture, we schedule the instructions of each loop nest and realize the automatic generation of the registers which are used to keep the intermediate results. Experiments show that the pipelined template can achieve a comparative execution cycles for a loop comparing with the special hardware, and at the same time our approach save the time of designing specific IP core for every loop nests.In summary, our works study the HLS of IP core for sliding-window operations, present solutions to several key problems of memory architecture, hardware description code generation and design space exploration of two situations. Our works have academic and practical value for advancing the theory and practicability of HLS of IP core for specific applications.
Keywords/Search Tags:SoC, IP core, high level synthesis, sliding-window operation, data reuse, design space exploration
PDF Full Text Request
Related items