Font Size: a A A

Research And Implementation Of FPGA-based Accelerating Methods For Convolutional Neural Network

Posted on:2019-01-27Degree:MasterType:Thesis
Country:ChinaCandidate:Y QiuFull Text:PDF
GTID:2428330548476164Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the widespread use of deep learning technology in many fields,Convolutional Neutral Network as its basic model has attracted more and more attention.CNN is widely used in image classification,face recognition,language detection,document analysis and so on.But only using software to speed up CNN is unable to meet the growing speed and power requirements,how to design a CNN accelerator with hardware has become the research focuses in the academic fields.As a parallel computing intensive acceleration hardware,FPGA has excellent performance power ratio,and has unique advantages compared to GPU and ASIC.But in practical use,how to efficiently use the limited FPGA resources on chip to achieve higher performance with less resources,how to design the architecture of FPGA hardware module or the more general there are enormous challenges.This paper presents an efficient convolution module ECM,it contains 4 PE units,and each PE unit is responsible for the calculation of an output feature map.Convolution data and convolution parameters are passed between PE units by cascade concatenation.In order to solve the problem of repeatedly reading and writing external registers in layer serial mode,double buffer storage mechanism is used to store the intermediate computing results into the FPGA chip.In addition,the data caching and distribution mode of the input register,and the internal structure of the PE unit and the pool module are designed.The whole efficient convolution module is responsible for the management and scheduling of each unit by the ECM control module.According to the common characteristics of the convolution neural network,the general general architecture of the CNN hardware accelerator based on FPGA is designed.This architecture solves the problem of reducing the overall computing speed by repeatedly reading and writing off chip memory,and improves the level serial mode because of the uneven distribution of computation,which wastes DSP resources.In addition,the whole architecture contains many groups of efficient convolution modules,which share convolution data by broadcast mode.On the one hand,the whole control module is responsible for the interaction with the PS terminal to achieve the command.On the other hand,it is responsible for the control of the whole operation process.Finally,based on the general architecture proposed in this paper,the FPGA hardware accelerator is implemented in combination with the ZynqNet model.In order to further improve the speed,this paper reduces the computational accuracy of ZynqNet from 32 bits to16 bits,thus a parallel structure of 64 PE units is designed to improve computing parallelism.The ImageNet results show that the optimized accelerator based on FPGA can achieve 10 times speedup compared to the original ZynqNet,and 20 times speedup compared to i5-5200 U CPU.In terms of performance power ratio,the FPGA accelerator is 5.4 times of NVIDIA GTX 970 GPU version.
Keywords/Search Tags:Convolutional Neutral Network, FPGA, general architecture, ZynqNet, acceleration
PDF Full Text Request
Related items