Research On Special Heterogeneous Accelerator Of Convolutional Neural Network Based On FPGA

Posted on:2021-01-25

Degree:Master

Type:Thesis

Country:China

Candidate:M X Zhao

Full Text:PDF

GTID:2428330602482324

Subject:Integrated circuit engineering

Abstract/Summary:

PDF Full Text Request

In recent years,deep learning has become a hot research field.Convolutional Neural Network(CNN)has achieved great success in many deep learning fields such as text recognition,image classification,and object detection.However,in the forward inference computation process of a CNN model,millions of or even hundreds of millions of floating-point multiply-accmulate operations and storage of floating-point parameters are often required,and the use of traditional CPUs or GPUs cannot fully exploit the characteristics of internal parallel computing in a CNN module.FPGA has the characteristics of low power consumption,flexible programming and short development cycle.Its internal logic unit can complete the task of parallel computing with low power consumption.Therefore,FPGA is an ideal choice for designing a CNN accelerator.Based on the Zynq-series FPGA of Xilinx company,using its heterogeneous SoC development platform of CPU+FPGA and Verilog HDL hardware description language,this paper implements a special heterogeneous accelerating co-processor for CNN.In this design,the CPU completes the tasks of sending pictures,polling,interrupting,and displaying classification results,and the FPGA completes the computation tasks of the CNN model.This paper first introduces the historical development of CNN and the research status of CNN accelerator.Then,by analyzing the computation process and the overall structure of CNN forward inference,the parallelism existing in the computation process of CNN is discussed in detail,and the implementation methods of different parallelisms and their corresponding resource and bandwidth requirements are also proposed.For LeNet-5,a convolutional neural network widely used in handwritten digit recognition,this paper first uses Dynamic Network Surgery technology of Intel to compress its size,then the 32-bit floating-point parameters in the model are quantized to 8-bit fixed-point parameters by using Incremental Network Quantization technology of Intel,and further optimized to 5-bit parameters to design a special fixed-point shift multiplier.This paper uses Verilog language to design the computation modules of multiply-accumulate array,pooled sampling,activation function,rounding and others.Through the integration of the computation modules based on the design idea of pipeline technology,the CNN hardware computation circuit with a computation accuracy of 8-bits is finally formed,and the forward inference of LeNet-5 is realized.Based on the above,this paper proposes two different heterogeneous SoC systems,and introduces the designs of overall architecture,cache strategy,PS side and PL side of the two systems.Then based on the modeling of SoC system,the two systems were simulated and verified respectively.For the 10000 test images of the Mnist test set,this design achieves a recognition accuracy of 98.9%.The experimental results show that in the optimized SoC system,the FPGA takes 24us to complete the inference computation of a handwritten digital picture at a clock frequency of 100Mhz,and its average computing power reaches 15.21GMAC/s,peak computing power reaches 33.6 GMAC/s.The performance-to-power ratio reaches 6.8890 GMAC/W,which is 1520 times that of a general-purpose CPU,and 160 times that of a general-purpose GPU,where the CPU is an Intel i5-8400 processor and the GPU is a GTX-1050Ti graphics card.

Keywords/Search Tags:

CNN accelerating co-processor, FPGA, SoC system, Shift computation

PDF Full Text Request

Related items

1	Research On Accelerating The Computation Of ?-OTDR Sensing System Based On FPGA
2	Research And Design Of A Key Technology For Accelerating Convolution Computation Based On FPGA
3	Acceleration And Optimization Of Deep Convolutional Neural Networks Based On FPGA
4	Research On Parallel Accelerating Algorithm Based On OpenCL And Realization On FPGA
5	Research On Accelerating Convolutional Neural Networks Via Eliminating Weight And Feature Redundancy
6	The Application And Research Of ARM Processor And FPGA In Data Transmission
7	Design And Implementation Of Convolutional Neural Network Accelerator Based On FPGA
8	Research And Implementation Of FPGA Acceleration Method For Convolutional Neural Network
9	A Research And Implementation In Accelerating Ray Tracing System
10	Design And Implementation Of Fpga Development Board Served For NoC Multi-core Processor