Font Size: a A A

Research On Convolutional Neural Network Optimization Based On FPGA Cluster Heterogeneous Platform

Posted on:2021-10-09Degree:MasterType:Thesis
Country:ChinaCandidate:H T LiangFull Text:PDF
GTID:2518306050467594Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the development of emerging fields such as the Internet of Things,autonomous driving,intelligent security,the growth of data volume and calculation volume has gradually increased the requirements for processor computing performance,resulting in a mismatch between demand and existing performance.Convolutional neural networks are dataintensive tasks.For some complex neural networks,the hardware resources of a single computing device could hardly meet the demand.Based on the heterogeneous approach of multiple boards,this paper proposes FPGA cluster as an acceleration device for neural networks.On the one hand,multiple FPGAs can flexibly increase the available hardware resources,and on the other hand,they have a better energy efficiency ratio.This paper focuses on the three key issues: FPGA cluster heterogeneous platform design,implementation of FPGA cluster heterogeneous platform for convolutional neural network,and FPGA cluster heterogeneous platform optimization.The main work of the paper is as follows:1 The realization of the hardware structure.This paper proposes a module-level pipeline structure based on FPGA cluster heterogeneous platform.This method uses the FPGA cluster as the inference accelerator of the neural network,which can flexibly adjust the computing power according to the scale of different neural networks.At the same time this platform has the characteristics of high energy efficiency.The ARM core on the master node FPGA is used as the host of the heterogeneous platform,which is responsible for data transmission on the board and communication with the host computer.The FPGA cluster is used as the coprocessor of the heterogeneous structure,which is mainly responsible for the execution of the convolutional neural network.In the FPGA cluster processing process,a module-level pipeline optimization strategy is adopted,and the convolutional neural network is divided into sub-modules according to the running time and executed by different FPGAs.In the view of a single processing flow,tasks are executed sequentially by different modules.In the view of large-scale task processing,a pipeline idea is adopted between modules,and each module can be processed in parallel.2 Implementation and result analysis of the FPGA cluster heterogeneous platform.For Le Net-5,the FPGA cluster heterogeneous method takes 38.94% of the processing time to execute the network by a single FPGA.Compared to single board execution,the maximum throughput of heterogeneous platform is increased to 2.57 times,the resource conversion efficiency of the lookup table(LUT)is increased to 2.61 times,and the energy efficiency ratio is increased to 2.82 times.For the Alex Net network,our heterogeneous method takes 17.74% of the processing time by a single FPGA.Compared to single board execution,the design maximum throughput rate is increased to 5.64 times,the resource conversion efficiency of the DSP is increased to 3.11 times,the resource conversion efficiency of the lookup table(LUT)is increased to 2.39 times,and the energy efficiency ratio is increased to 1.05 times,which meets the corresponding design requirements.3 In order to optimize the throughput and energy efficiency ratio,based on the module-level pipeline processing method in the coprocessor.Based on the quantitative analysis of the data transmission delay between the boards,this paper proposes an optimized design which take the inter-board communication delay as the first level of the pipeline.This paper also proposes an evaluation method of resource conversion efficiency(RCE)to quantitatively measure the contribution of unit resources to the throughput rate.For the implementation of Alex Net in FPGA cluster heterogeneous platform,the optimized design is increased the throughput to 105.19% of the existing level of the solution.4 For the partitioning of convolutional neural network modules,this paper proposes a task matching optimization design based on dichotomy.In this design,the different layers of the convolutional neural network are the smallest units that cannot be separated.Through the roof line model,the throughput rate bottleneck in different situations is analyzed,and the optimal division result is obtained by iterative calculation based on the idea of dichotomy.For the implementation of Le Net-5 in FPGA cluster heterogeneous platform,the optimized design increased the throughput to 109.68% of the existing level of the solution.For the implementation of Alex Net in FPGA cluster heterogeneous platform,the optimized design increased the throughput to 106.07% of the existing level of the solution,which meets the corresponding design requirements.
Keywords/Search Tags:Inter-board Heterogeneous, FPGA cluster, pipeline structure, convolutional neural network, throughput optimization
PDF Full Text Request
Related items