Font Size: a A A

Research On Automatic Data Placement For CPU-FPGA Heterogeneous Multiprocessor System-on-chips

Posted on:2020-09-25Degree:MasterType:Thesis
Country:ChinaCandidate:S Q LiFull Text:PDF
GTID:2428330572988979Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development and flourish of some applications such as deep learning and big data,traditional CPU-based architecture has already failed to meet the computation requirements of these new applications.Researchers in industry and academy begin to choose hardware accelerators to overcome traditional CPU-based architecture's disadvantages.FPGA attracts more and more attention due to its high energy efficiency ratio and flexible dynamic part reconfiguration function.However,the process of traditional FPGA-based design is usually time-consuming and hard to debug.With the development of FPGA,HLS(High Level Synthesis)tools gain widespread utilization,which reduce the difficulty of designing and implementing FPGA-based designs efficiently.HLS tools compile a kernel using high specification language like C/C++ into corresponding HDL modules automatically,which make it easy to design and implement a FPGA-based design,especially for most software engineer.Meanwhile,HLS tools provide some built-in optimization directives to let system designers optimize system targeting some goals including hardware resource costs,performance,energy consumption and so on.Otherwise,different from traditional CPU-based architecture,heterogeneous systems' memory subsystem usually is more complex.To be specific,in the pure CPU-based architecture,memory hierarchy generally consists of multiple level Caches and main memory.However,for heterogeneous systems,there are SPM(Scratchpad memory)and shared Cache which can be accessed by CPU part and accelerator part.These memory hierarchies have their own features and making a rational utilization of them can improve the overall system performance significantly.For CPU-FPGA HMPSoC(heterogeneous multiprocessor system-on-chips),the on-chip memory's capacity is limited.Consequently,it's more important to utilize it reasonably.State-of-the-art high level synthesis(HLS)tools rely on the system programmers to manually determine the data placement within the complex memory hierarchy.In this paper,we propose an automatic data placement framework which can be seamlessly integrated with the commercial Vivado HLS.First of all,we design a set of microbenchmarks to measure some memory access latency,such as Cache hit,Cache miss,or access main memory directly based on Zedboard HMPSoC.Based on the analysis of the memory subsystem model's data,we show some counter-intuitive results:Cache doesn't show as much performance as what we have expected;for the burst mode access,the memory access latency doesn't show any correlation with the choice of memory resource.Hence,based on the results,we find that traditional frequency and locality based data placement strategy designed for CPU architecture leads to non-optimal system performance in CPU-FPGA HMPSoCs.Built on top of our memory latency analysis model and LLVM compilation framework,we propose an integer linear programming(ILP)based automatic data placement framework to determine whether each array object should be access via the on-chip BRAM,shared CPU L2-Cache,or DDR memory directly.In addition,we design a greedy based baseline algorithm.Experimental results on the Zedboard platform show an average 1.39X performance speedup compared with the baseline.
Keywords/Search Tags:Data placement, memory architecture, FPGA, heterogeneous multiprocessor system-on-chip, high level synthesis
PDF Full Text Request
Related items