Font Size: a A A

FPGA-based Neural Network Accelerator Operation And Memory Access Optimization Design

Posted on:2022-11-21Degree:MasterType:Thesis
Country:ChinaCandidate:H Z WangFull Text:PDF
GTID:2518306788456774Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
Deep neural network(DNN)can effectively get important information from massive and complex data,and has been widely used in image recognition,speech recognition and natural language processing and other fields.However,the use of these neural networks will bring a large computational and memory overhead,which limits the use in resource-limited scenarios.Traditional CPU and GPU processors cannot satisfy the high performance and low energy consumption requirements of DNN algorithms in many application scenarios due to the limited capacity of computing and memory units in their hardware architectures.Therefore,the researchers propose to design the hardware structure suitable for the neural network algorithm on the field programmable gate array(FPGA).At present,there are many researches using FPGA to accelerate the DNN algorithm,but most researches just put the neural network algorithm on the FPGA simply.However,few researchers combine the programmable characteristics of FPGA with the characteristics of weight data and input data in DNN,and integrate the algorithm and hardware structure for overall performance optimization.In this paper,we focus on designing a specific DNN acceleration algorithm on the FPGA architecture.The main work of this paper is as follows:In this paper,we mainly focus on the problems of huge data transmission and computation in existing accelerators.By analyzing the characteristics of the input data and the physical meaning of the weight data,we propose a CSCF strategy to compress the input data,and design a channel group-based calculation(CGBC)algorithm to reduce the amount of convolution calculations,so as to ensure accuracy while reducing the amount of memory performance optimization with computing resources.In this paper,we first design a CSCF processing strategy for massive input data to compress the amount of input data.We first scan and compress the collected data,and then classify and store the compressed data according to the positions of consecutive zero-value pixel blocks.We design a classification calculation unit corresponding to the storage unit on the FPGA architecture,and improve the performance of the model by integrating the first-layer convolution calculation of the convolution neural network and the compressed storage of the input data.Furthermore,to reduce the computational load of DNN models,we identify a new Multiplication to Judgment Computation(CMJC)algorithm that can convert each multiplication to just one judgment.Since existing CMJC algorithms usually have lower output precision,we propose a new CMJC algorithm called the channel group-based computation(CGBC)algorithm.This algorithm uses the channel group of each convolution kernel as the minimum calculation unit,instead of extracting a common quantization factor for the entire convolution kernel,so as to maintain the characteristics of the weight,and can obtain very good accuracy.To support the CMJC algorithm,we design an FPGA-based reduced multiplication(RM)accelerator that implements a prediction mechanism for the CGBC algorithm,reducing the amount of computation.The experimental results show that,compared with the traditional neural network accelerator,the CSCF algorithm can achieve a speedup ratio of 1.8 to 2.1 times,and the fluctuation of hardware resource utilization is small.The speedup of the CGBC-RM accelerator is improved by 2.0 to 2.4 times,the energy efficiency is improved by 7.1 to 8.1 times,and the output accuracy is very similar to that of the traditional algorithm.
Keywords/Search Tags:FPGA, high performance, deep neural network accelerator
PDF Full Text Request
Related items