Font Size: a A A

Research And Design Of Key Technology Of FPGA-based Convolutional Neural Network Accelerator

Posted on:2021-03-17Degree:MasterType:Thesis
Country:ChinaCandidate:J L MaFull Text:PDF
GTID:2428330623482229Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
At this stage,the rapidly developing convolutional neural network has demonstrated excellent recognition and classification capabilities in image classification,target detection,semantic segmentation and other application fields.The performance improvement of convolutional neural networks has become a key breakthrough point in the promotion and application of convolutional neural networks.Facing the challenges of deepening the number of convolutional layers,increasing the amount of data operations and increasing storage capacity,traditional software acceleration methods are difficult to meet the real-time requirements of applications.Research on the hardware acceleration technology of convolutional neural networks has become the only way.This paper analyzes the convolutional neural network algorithm,researches and designs a highperformance convolutional neural network accelerator,which can effectively improve the performance of the convolutional neural network algorithm.1.This paper first studies and designs a set of accelerated instruction system for convolutional neural network.Based on a thorough analysis of the typical convolutional neural network algorithm topology and related arithmetic operations,in order to reduce the data storage capacity of a single operation core and improve data reuse,we studied the convolutional neural network algorithm division technology and designed the basic operation function set for convolutional layer,pooling layer,fully connected layer.Map the entire convolutional neural network algorithm through function calls.On the basis of fully analyzing the operation operations of convolutional layer,pooling layer,fully connected layer and activation function,the operation logic of convolutional neural network is summarized,and the basic operation instructions of convolutional neural network accelerator are researched and designed.On the basis of fully analyzing the characteristics of data position transformation during the operation of convolutional layer,pooling layer,and fully connected layer,the storage logic of convolutional neural network is summarized.We researched and designed the instruction addressing method and storage array transposition method of the convolutional neural network accelerator.2.This paper studies and proposes a convolutional neural network accelerator based on a near-computing storage array architecture.In order to reduce the delay caused by memory access,combined with the convolutional neural network storage logic feature,a near-computing storage array architecture is proposed,which can realize the rapid position transformation of feature map data through operations such as shift and line feed.In order to reduce the demand for computing resources and storage resources,a layer-by-layer parameter quantization method is proposed.Under the premise of ensuring the accuracy of the convolutional neural network algorithm,the floating-point data of each layer is converted into fixed-point data representation,and the basic operation unit is designed according to the requirements of the instruction system.In order to further optimize the performance of the accelerator,a multi-granular parallel optimization strategy was designed and implemented,and three levels of parallelism of the convolution window,input channel,and output channel of the feature map were developed.3.This paper builds a comprehensive verification platform for convolutional neural network accelerators,and performs functional verification and performance testing.In terms of platform construction,an algorithm verification platform,simulation verification platform and hardware test platform were built.Take the VGG-16 algorithm as an example to divide functions,generate test incentives based on Image Net's data sets,and prove the correctness of the convolutional neural network accelerator through function verification and system verification.In terms of resource consumption,the resource occupancy of convolutional neural network accelerators with different configurations was evaluated.In terms of performance testing,the performance test and evaluation of the convolutional neural network accelerator were performed.After testing,the computing performance of the convolutional neural network accelerator with a 256-core configuration scheme reaches 50.47 GOPS,which is 51.62 times that of the Matlab software parallel implementation and 3.15 ~ 9.61 times higher than the existing FPGA-based hardware accelerator.
Keywords/Search Tags:convolutional neural network, FPGA, instruction design, parameter quantization, parallel processing
PDF Full Text Request
Related items