Design Space Exploration For Deep Learning Accelerator And Design And Implementation Of The Accelerator In FPGA

Posted on:2018-08-08

Degree:Master

Type:Thesis

Country:China

Candidate:Z S Li

Full Text:PDF

GTID:2428330623450863

Subject:Microelectronics and Solid State Electronics

Abstract/Summary:

PDF Full Text Request

Research on Convolutional Neural Networks(CNNs)is an important branch of deep learning algorithms.CNNs have made brilliant achievements in image classification and speech recognition,due to its good non-linear fitting properties.With the continuous development and progress of CNNs,more and more fields are involved in CNNs in practical applications,and application requirements for their performance are getting higher and higher.By analyzing the excellent neural networks in the ImageNet in recent years,we find that the networks become more and more complicated.Memory access and computing has become a major bottleneck performance.Therefore,accelerating CNNs becomes an indispensable task.FPGAs have inherent advantages over GPUs and ASIC in terms of flexibility,power consumption,and development cycles,making them an important area that cannot be ignored in accelerating CNNs.Aiming at the challenge in accelerating,we have done the following works.First,based on the roof-line model,the hardware simulator and the peripheral optimization module,combined with the existing research results,we proposal a design space exploration framework ACCDSE for convolutional layer in CNNs.The framework can realize the parameter configuration under various performance requirements,determine the relevant parameters in the early stage of accelerator design.Second,through the modification of the deep learning framework Caffe,we make the floating-point arithmetic in the training process replaced by the fixed-point arithmetic to reduce the computational complexity and provide a training platform for hardware accelerators using fixed-point arithmetic.Third,based on the research results of the parts above,we design and implement a inference accelerating engine of LeNet on the FPGA platform.The engine uses 8-bit fixed-point arithmetic and various optimization methods to improve performance,including weight resolution,ping pong optimization,etc..And it also uses mathematical models to optimize resource allocation.Our group conducted several versions of the hardware implementation on Xilinx485 t FPGAs.The hardware evaluation report shows that the 8-bit fixed-point inference engine under the same configuration parameters,compared to 32-bit fixed-point engine,has a 31.43% reduction in latency,saving 87.01% of the LUT resources,66.5% of the on-chip memory(BRAM)and 65.11% DSP resources and 47.95% power consumption.By using the ping-pong optimization to achieve a coarse-grained pipeline,the throughput reached 44.9Gops with only 1% reduction in accuracy comparing to the32-bit fixed-point engine.

Keywords/Search Tags:

Convolutional Neural Network, Design Space Exploration, FPGA, Accelerator, Quantify

PDF Full Text Request

Related items

1	Design Of General-purpose Convolutional Neural Network Accelerator Based On FPGA
2	Design Of Hardware Accelerator Based On FPGA For Convolutional Neural Networks
3	FPGA-Based Accelerator For Convolutional Neural Network
4	Parallel Accelerator Design For Convolutional Neural Networks Based On FPGA
5	Deep Learning Accelerator Design And Implementation For EEG Classification On FPGA
6	Accelerator Design And Research Of Depthwise Separable Convolutional Neural Network Based On FPGA
7	Research Of Scalability On FPGA-based Neural Network Accelerator
8	Implementation Of Convolutional Neural Network Accelerator Based On FPGA In Intelligent Orthokeratology Matching Algorithm
9	Design And Implementation Of A Reconfigurable Convolutional Neural Network Accelerator Based On FPGA
10	A Convolutional Neural Network Accelerating Circuit Design And FPGA Implementation