Font Size: a A A

Research On Optimization Technologies Of FPGA-based Convolutional Neural Network Implementation

Posted on:2022-12-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y Z YeFull Text:PDF
GTID:2518306764463784Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
Artificial intelligence is widely used in various fields and greatly facilitates people’s lives.Convolutional Neural Network(CNN)is an important branch of artificial intelligence,which is widely used in image classification,speech recognition and other fields.CNN can be implemented by CPU,FPGA,GPU and ASIC,among which FPGA has attracted more and more attention due to its advantages such as flexible design and low power consumption.However,the implementation of CNN based on FPGA also faces challenges such as how to use on-chip resources to obtain great computing power and to achieve high throughput.This thesis studies the CNN optimization technologies based on FPGA and mainly focuses on the convolutional layer with the largest computation.The optimization techniques are used to implement Fast Super-Resolution Convolutional Neural Network(FSRCNN)and experimentally verified.The main work is presented:The parallelism in convolutional layer is analyzed,which provides a basis for performance optimization of FPGA-based CNN.The computational optimization techniques are studied and two kinds of efficient computing units are designed.Based on the idea of one-dimensional Winograd algorithm and convolution decomposition,an Efficient Convolutional Element(ECE)is designed,which effectively reduces the use of DSP on FPGA.An Efficient Multiplication Unit(EMCU)is designed according to the characteristics of DSP and the algorithm of Multiplication.EMCU can simultaneously calculate A×C and B×C using only one DSP when A and B are signed numbers and C is unsigned,and the bit widths of A,B and C do not exceed 8 bits.The EMCU is optimized based on the idea of multiplication decomposition,and the improved EMCU unit can work even when the bit widths of A,B and C are 13 bits.FSRCNN is implemented on Zynq-7035 FPGA with several optimization technologies.The advantages and disadvantages between the single Convolution Layer Processor(CLP)scheme and the multi-CLP scheme are compared.We choose the multiCLP structure for FPGA-based FSRCNN.The data of FSRCNN model are turned into fixed-point numbers,and the loss of model accuracy is less than 4%.The input data buffer and weight buffer are designed to meet the data requirements of convolution computing units.The pipeline technology is used inside the CLP and among CLP layers to improve the working frequency of FSRCNN.The parallelism of CLP in each layer is optimized to improve the efficiency of pipeline.ECE and EMCU are applied to implementing FSRCNN to reduce the use of DSP.The experimental results show that the optimization techniques used in this design,such as ECE,EMCU,pipeline optimization and parallelism optimization,can improve the computing power of DSP and the throughput of CNN.The FSRCNN implemented on FPGA can reconstruct images from 360×202 to 1080×606,with a working frequency up to 150 MHz.It can reach 250 GOPS throughput and compute 949 times faster than5600 G CPU.
Keywords/Search Tags:Convolutional Neural Network, FPGA, optimization technologies, Fast Super-Resolution Convolutional Neural Network
PDF Full Text Request
Related items