Font Size: a A A

Research On FPGA Hardware Acceleration Platform For Deep Learning

Posted on:2019-04-30Degree:MasterType:Thesis
Country:ChinaCandidate:Q F HongFull Text:PDF
GTID:2348330569487895Subject:Microelectronics and Solid State Electronics
Abstract/Summary:PDF Full Text Request
In recent years,there has been an upsurge of artificial intelligence in the world.Artificial intelligence technology has not only been widely studied in the academic community,but has also been put into commercial use after being deeply developed in industry,bringing huge benefits to the national economy.Deep learning is one of the most popular technologies in the field of artificial intelligence.It is a set of algorithm models based on the biological principles of the brain when humans recognize things.It is a set of algorithm models for learning large-scale data,and has achieved excellent results in areas such as computer vision and natural language processing,speech recognition.In the early days,people used CPU to perform deep learning algorithms,but CPU could not efficiently implement deep learning algorithms that contain a large number of numerical calculations.Later,people began to introduce GPU devices into deep learning.The GPU contains a large number of computing cores,which is suitable for accelerating some highly parallel models in deep learning algorithms,such as the convolutional neural network.However,GPU has the disadvantage of high energy consumption.It will violates the trend of green and energy saving in data centers if GPUs are deployed on a large scale.Later,as a new accelerating device,FPGA has gradually attracted the attention of a large number of scholars because of its advantages such as low power consumption and reconfigurability.In this context,this article will combine the features of CPU and FPGA,use CPU as the control host,use FPGA as the acceleration device,and build a master-slave architecture hardware acceleration platform.This platform is used to accelerate two important models in deep learning algorithms.One is a recurrent neural network model and the other is a convolutional neural network model.The former is used to solve the problem of pattern recognition on time series,and the latter is used to implement the feature recognition problem on two-dimensional space.For the recurrent neural network,we adopt the idea of combining data and task parallelism.We designed a general parallel acceleration scheme for the training process.We tried to explore the influence of the number of hidden layer neurons on the acceleration performance.Utilizing the heterogeneous parallel programming language OpenCL,we wrote kernel programs executed on the FPGA.The experiment found that as the number of hidden layer neurons increases,the accelerated performance of the FPGA gradually approaches the CPU,while the energy efficiency of the FPGA is higher than that of the CPU and the GPU;For the convolutional neural network,similarly,we have designed a general parallel acceleration scheme for the training and inference processes.Experiments show that on the MNIST data set,the FPGA has a shorter inference time than the CPU at the same correct rate,the energy efficiency of FPGA is close to 10 times that of the CPU and slightly higher than the GPU.On the CIFAR-10 data set,the acceleration and energy efficiency of the FPGA are between the CPU and the GPU.In the channel-based convolutional neural network acceleration experiment in inference process,the time of inference by the FPGA under this scheme is also slightly lower than that of the general acceleration scheme.Therefore,the general parallel acceleration scheme implemented by FPGA can accelerate the training and inference process of deep learning algorithm completely without reducing the accuracy.The parallel acceleration scheme based on the channel structure can achieve better acceleration effect in the inference process than the general parallel acceleration scheme.
Keywords/Search Tags:Deep Learning, FPGA, Parallel acceleration, OpenCL
PDF Full Text Request
Related items