Research On Dataflow Architecture-based High Level Synthesis For Graph Processing

Posted on:2021-12-08

Degree:Master

Type:Thesis

Country:China

Candidate:J W Tang

Full Text:PDF

GTID:2518306104488084

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Graph processing is one of the most important computational processing models for Big Data processing.It has been widely used in a variety of fields,such as biological networks and Web rankings.It has been demonstrated that graph processing under the traditional Central Processing Unit(CPU)and Graphics Processing Unit(GPU)architectures exhibits the significant features of load unbalancing,irregular communication,and random memory accesses thereby degrading its performance and energy efficiency greatly.Field Programmable Gate Array(FPGA)is highly recognized due to its advantages of low power consumption and reconfigurability,and can be often considered particularly useful for improving the energy-efficiency of graph processing.However,writing the correct hardware-level codes and further verifying their functionalities are notoriously tedious and time-consuming.Although general-purpose High Level Synthesis(HLS)systems can be used to automatically generate the underlying hardware codes from high-level descriptions.However,due to the irregularity of graph applications,these earlier HLS systems still lack effective support for massive parallelism and efficient memory access,potentially leading to inefficient hardware architecture and significantly low performance.In this paper,we propose a dataflow architecture-based HLS method for high performance graph processing.In view of the characteristics of random memory access,power-law distribution,and nested loops arising in graph processing,we design a programming model,an intermediate-level modular dataflow intermediate representation(IR),and an underlying parameterized hardware template,enabling achieving efficient parallel pipeline.We also present a load balancing mechanism for the processing units.Specifically,the adaptive information transfer for vertices and edges achieves high parallelism processing of the workloads,and the dynamic scheduling of vertices with different degrees eliminates the load unbalancing.Through the partitioning of the on-chip memory and the scheduling control of memory access,the memory access from different cycles can be merged and processed in parallel,further improving the efficiency of memory accesses.Our approach can generate a high-performance graph algorithm hardware code with the following three steps: 1)implementing a high-level description of the graph algorithm through the functional programming primitives provided by the upper-level programming model,2)then converting it into a modular data flow IR,and 3)finally mapping the modular IR to the parameterized hardware templates.We build our HLS method on Xilinx UltraScale+ VU9 P and verify its correctness and effectiveness.Results show that our HLS method outperforms the state-of-the-art Spatial by 7.9x-30.6x speedups.

Keywords/Search Tags:

Graph Processing, High Level Synthesis, Dataflow Architecture, Intermediate Representation, FPGA

PDF Full Text Request

Related items

1	AID: An interactive debugger for high-level synthesis that uses dependence flow graphs as the intermediate representation
2	Design And Application Of Configurable High-level Synthesis Functional Library Based On FPGA
3	Research On Key Technologies Of High Level Synthesis On FPGA For Cryptographic Application
4	The Research Of Algorithm In High Level Synthesis Optimization
5	Research On Key Technologies Of Automatic Generation Of Hardware Code For Cryptographic Application
6	High Level Synthesis and Evaluation of an Automotive RADAR Signal Processing Algorithm for FPGA
7	FPGA Implementation of QR Decomposition Algorithms Using High-Level Synthesis on Zynq So
8	Synthesis of parallel hardware implementations from synchronous dataflow graph specifications
9	Code Transformation-based High-level Synthesis Optimization Method For FPGA And Applications
10	Based On The Password Chip Design Reconfigurable Architecture And Its Fpga Implementation