Font Size: a A A

Design Space Exploration For Deep Learning Accelerator And Design And Implementation Of The Accelerator In FPGA

Posted on:2018-08-08Degree:MasterType:Thesis
Country:ChinaCandidate:Z S LiFull Text:PDF
GTID:2428330623450863Subject:Microelectronics and Solid State Electronics
Abstract/Summary:PDF Full Text Request
Research on Convolutional Neural Networks(CNNs)is an important branch of deep learning algorithms.CNNs have made brilliant achievements in image classification and speech recognition,due to its good non-linear fitting properties.With the continuous development and progress of CNNs,more and more fields are involved in CNNs in practical applications,and application requirements for their performance are getting higher and higher.By analyzing the excellent neural networks in the ImageNet in recent years,we find that the networks become more and more complicated.Memory access and computing has become a major bottleneck performance.Therefore,accelerating CNNs becomes an indispensable task.FPGAs have inherent advantages over GPUs and ASIC in terms of flexibility,power consumption,and development cycles,making them an important area that cannot be ignored in accelerating CNNs.Aiming at the challenge in accelerating,we have done the following works.First,based on the roof-line model,the hardware simulator and the peripheral optimization module,combined with the existing research results,we proposal a design space exploration framework ACCDSE for convolutional layer in CNNs.The framework can realize the parameter configuration under various performance requirements,determine the relevant parameters in the early stage of accelerator design.Second,through the modification of the deep learning framework Caffe,we make the floating-point arithmetic in the training process replaced by the fixed-point arithmetic to reduce the computational complexity and provide a training platform for hardware accelerators using fixed-point arithmetic.Third,based on the research results of the parts above,we design and implement a inference accelerating engine of LeNet on the FPGA platform.The engine uses 8-bit fixed-point arithmetic and various optimization methods to improve performance,including weight resolution,ping pong optimization,etc..And it also uses mathematical models to optimize resource allocation.Our group conducted several versions of the hardware implementation on Xilinx485 t FPGAs.The hardware evaluation report shows that the 8-bit fixed-point inference engine under the same configuration parameters,compared to 32-bit fixed-point engine,has a 31.43% reduction in latency,saving 87.01% of the LUT resources,66.5% of the on-chip memory(BRAM)and 65.11% DSP resources and 47.95% power consumption.By using the ping-pong optimization to achieve a coarse-grained pipeline,the throughput reached 44.9Gops with only 1% reduction in accuracy comparing to the32-bit fixed-point engine.
Keywords/Search Tags:Convolutional Neural Network, Design Space Exploration, FPGA, Accelerator, Quantify
PDF Full Text Request
Related items