Font Size: a A A

Research And Implementation Of Heterogeneous Computing Based On FPGA

Posted on:2021-08-29Degree:MasterType:Thesis
Country:ChinaCandidate:Q F LiFull Text:PDF
GTID:2518306050967609Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of artificial intelligence and big data,the demand for high performance computing has increased dramatically.Especially the rapid development of artificial intelligence,with more and more application fields,more and more complex application scenarios,and rapid iterative updating of various new algorithm models,which has increasingly demanded hardware computing capabilities.GPU devices have superior computing performance,but the power consumption is becoming increasingly serious.FPGA have the characteristics of low power consumption and high energy efficiency,so it has broad application prospects in the field of accelerated computing.The traditional FPGA development process has high threshold and needs long time.The emergence of the OpenCL standard has greatly improved this situation.Matrix multiplication is a classic matrix processing algorithm,which is widely used in weather forecasting,nuclear physics,and deep learning.The Squeeze Net neural network model is a classic small-scale convolutional neural network with few parameters and is easy to deploy on FPGA.Therefore,this paper is based on FPGA heterogeneous platform,and researches the OpenCL accelerated implementation of Squeeze Net model and matrix multiplication.This article first introduces knowledge about heterogeneous computing and OpenCL standards.According to the characteristics of FPGA architecture,taking local memory into consideration,we establish a mathematical model for performance analysis of FPGA.Then we analyze the complexity and computational parallelism of matrix multiplication and Squeeze Net model,and give design to improve computational parallelism.According to the analysis results,an optimization scheme is proposed.For matrix multiplication,we use the optimization method of block calculation,which store the data blocks into local memory to reduce the data read time.The performance bottleneck obtained by the performance analysis model lies in the DSP processing speed.For the Squeeze Net model,we design a general convolution acceleration kernel and a pooling acceleration kernel to implement the entire network model by multiplexing.For the design of convolution operation,we map the three-dimensional input of the convolution layer to a two-dimensional matrix,and convert the three-dimensional convolution operation into two-dimensional matrix multiplication.We use the optimization method of matrix multiplication to optimize the convolution operation.We design data reading kernel to realize the data block reading and storage.The convolution kernel reads data from the data reading kernel to perform Matrix multiplication operation,the result is passed to the data written kernel to output.For the design of pooling operation,we expand the pooling calculation cycle to realize the multi-input parallel calculation.Using performance analysis models,we obtain the bottleneck of convolution acceleration kernel is DSP processing speed,the bottleneck of pooling accelerating kernel is global memory bandwidth.Finally,by using the OpenCL standard,this article implements the above optimization scheme on the FPGA heterogeneous computing platform,and compares the results with the CPU and GPU.The results show that the FPGA with the backward process has the highest performance power ratio.On the implementation of matrix multiplication,the performance power ratio of FPGA is 14.75 times that of CPU and 1.83 times that of GPU.On the implementation of Squeeze Net model,the performance power ratio of FPGA is 10.48 times that of CPU and 1.12 times that of GPU.
Keywords/Search Tags:FPGA, Heterogeneous computing, Convolutional neural network, OpenCL
PDF Full Text Request
Related items