Font Size: a A A

Research On Dataflow Architecture-based High Level Synthesis For Graph Processing

Posted on:2021-12-08Degree:MasterType:Thesis
Country:ChinaCandidate:J W TangFull Text:PDF
GTID:2518306104488084Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Graph processing is one of the most important computational processing models for Big Data processing.It has been widely used in a variety of fields,such as biological networks and Web rankings.It has been demonstrated that graph processing under the traditional Central Processing Unit(CPU)and Graphics Processing Unit(GPU)architectures exhibits the significant features of load unbalancing,irregular communication,and random memory accesses thereby degrading its performance and energy efficiency greatly.Field Programmable Gate Array(FPGA)is highly recognized due to its advantages of low power consumption and reconfigurability,and can be often considered particularly useful for improving the energy-efficiency of graph processing.However,writing the correct hardware-level codes and further verifying their functionalities are notoriously tedious and time-consuming.Although general-purpose High Level Synthesis(HLS)systems can be used to automatically generate the underlying hardware codes from high-level descriptions.However,due to the irregularity of graph applications,these earlier HLS systems still lack effective support for massive parallelism and efficient memory access,potentially leading to inefficient hardware architecture and significantly low performance.In this paper,we propose a dataflow architecture-based HLS method for high performance graph processing.In view of the characteristics of random memory access,power-law distribution,and nested loops arising in graph processing,we design a programming model,an intermediate-level modular dataflow intermediate representation(IR),and an underlying parameterized hardware template,enabling achieving efficient parallel pipeline.We also present a load balancing mechanism for the processing units.Specifically,the adaptive information transfer for vertices and edges achieves high parallelism processing of the workloads,and the dynamic scheduling of vertices with different degrees eliminates the load unbalancing.Through the partitioning of the on-chip memory and the scheduling control of memory access,the memory access from different cycles can be merged and processed in parallel,further improving the efficiency of memory accesses.Our approach can generate a high-performance graph algorithm hardware code with the following three steps: 1)implementing a high-level description of the graph algorithm through the functional programming primitives provided by the upper-level programming model,2)then converting it into a modular data flow IR,and 3)finally mapping the modular IR to the parameterized hardware templates.We build our HLS method on Xilinx UltraScale+ VU9 P and verify its correctness and effectiveness.Results show that our HLS method outperforms the state-of-the-art Spatial by 7.9x-30.6x speedups.
Keywords/Search Tags:Graph Processing, High Level Synthesis, Dataflow Architecture, Intermediate Representation, FPGA
PDF Full Text Request
Related items