Font Size: a A A

Research Of Convolution Neural Network Acceleration System Based On FPGA

Posted on:2022-03-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y XiangFull Text:PDF
GTID:2518306341463704Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Convolutional neural network,as a new way to realize artificial intelligence,is usually deployed on the network edge devices to realize the functions of object detection and recognition.In the edge computing scene,edge devices are often in the environment of difficult maintenance,poor power supply and high computing performance requirements.Compared with traditional neural network calculators such as CPU and graphics processor,field programmable gate array has the characteristics of low power consumption,small volume,strong parallel computing ability and low requirements for supporting facilities,so field programmable gate array is used as a new computing platform Convolutional neural network edge computing device is a better choice.In this paper,the acceleration strategy of convolutional neural network deployed in edge computing scene is studied in the following three aspectsFirstly,the parameters involved in the calculation process of convolution neural network are analyzed and optimized.In traditional convolutional neural network calculators,poly decay strategy is often used as a learning rate updating method.However,the use of poly decay strategy involves a large number of floating-point operations,which does not match the hardware characteristics of field programmable gate array.Therefore,this paper improves the poly decay strategy,using a fixed step size as the basis of learning rate update,so as to reduce the algorithm dimension of poly decay and reduce the computational pressure of neural network calculator.For softmax classification function,the traditional method involves complex index calculation,so this paper,on this basis,in the way of allowing maximum error,transforms the index function into a segmented way and stores it in the look-up table,which is convenient to call during calculation,so as to improve the calculation performance.For ordinary parameters,this paper uses dynamic determination of Q-value bits to improve single Q-value quantization,and further reduces the storage space occupied by ordinary parameters.Secondly,the parallelism of convolution neural network is analyzed and optimized.On the convolution kernel level,through the analysis of the same convolution kernel operation process,based on the pipeline strategy,the method of using data buffer is proposed,which further reduces the number of data interaction with the external memory and the data waiting time,maximizes the use of the computing resources of the programmable gate array,and realizes the stream output of the calculation results.At the level of multi-channel characteristic graph,the final multi-channel convolution result is obtained by means of channel separation,stream output and time delay addition.At the level of convolution layer calculation,the inferential precomputation method is used to start the calculation of the next convolution layer ahead of the time when the required result of the upper layer is not calculated,and the internal register is used to store the temporary calculation result to improve the calculation efficiency.Finally,the hardware resource allocation in the process of convolution neural network calculation and the hardware implementation of each module are designed.In this paper,based on Xilinx 7z035-2ffg676 i processor,a convolutional neural network accelerated computing device is designed by using Xilinx 7z035 development board.In the data transmission system,the QSPI bus is used for on-chip data transmission,and the queue transmission mechanism is used to realize the fast data exchange among the modules;the data interaction scheme based on PCI is adopted for the communication between the development board and the host,which improves the transmission capacity of large quantities of concurrent data.The convolution window reading system is redesigned,and the high-speed data stream serial parallel conversion is realized by using the combination of multiple first in first out modules,so as to meet the needs of parallel computing mode in the computing module.At the same time,the functions of padding and useless data elimination are also realized.In the adder module,the principle of parallel prefix adder is used,the rich gate circuit resources on the development board are used,and the high complexity circuit is used to reduce the algorithm complexity of traditional adder and improve the speed of addition calculation.In addition,the hardware circuit of full connection and pooling modules is designed,and the modular convolution neural network acceleration scheme is provided.The results show that in the scene of video vehicle recognition using convolutional neural network,compared with the convolutional neural network calculator using CPU architecture,the system can greatly improve the operation speed of CNN,and can obtain higher video frame rate;compared with the convolutional neural network architecture using GPU,the system can greatly reduce the operation speed of CNN without greatly reducing the operation speed of CNN Low system power consumption.
Keywords/Search Tags:Convolutional Neural Networks, Field Programmable Gate Array, Parallel Computing, Edge Computing, Acceleration System
PDF Full Text Request
Related items