Font Size: a A A

Design Of Lightweight And Configurable Convolutional Neural Network Accelerator

Posted on:2022-10-09Degree:MasterType:Thesis
Country:ChinaCandidate:G M YangFull Text:PDF
GTID:2518306605470014Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
In recent years,convolutional neural networks have developed vigorously,the accuracy of network has been continuously improved,and the application scenarios have become more and more extensive.However,the storage pressure of network has continued to increase,and the amount of model calculations has been increasing.In the past,convolutional neural network implementations such as CPU and other general processors,its serial instruction pipeline structure is not conducive to the parallel operation of convolutional neural networks.The graphics processor unit(GPU)has a good effect in accelerating convolution operations through a large number of fixed-floating-point arithmetic units,but the high power consumption and area make it impossible to deploy in embedded devices.FPGA is a convolutional neural network carrying platform that has emerged in recent years.Its rich computing resources and design flexibility are more suitable for convolutional neural network implementation than GPU.However,the energy efficiency of FPGA convolution operation is still lower than that of application-specific integrated circuit(ASIC).ASIC chip realizes convolution operation acceleration through dedicated arithmetic unit and storage structure,and can obtain a higher energy efficiency.At the same time,network lightweight methods such as numerical quantification can reduce model storage and computing pressure without sacrificing model accuracy too much,providing the possibility for embedded deployment of deep convolutional neural network models.The hierarchical nature of the convolutional neural network provides configurability for its hardware implementation,avoiding the waste of hardware resources caused by network updates.Based on this,this thesis designs a lightweight configurable convolutional neural network accelerator suitable for embedded platforms such as drones and smart driving cars on the basis of in-depth research on the acceleration of convolutional neural networks.Lightweight convolution,post-pipeline processing,parallel convolution operations,model structuring,and optimized storage access are designed to accelerate convolutional neural network operations.This article firstly introduces the basic structure of the convolutional neural network,analyzes the operational characteristics of each layer of the network,and introduces the convolutional neural network model SSD used in the test of this article and the difficulties and limitations of the traditional convolutional neural network.Secondly,by studying the model lightweight method,adopting the design ideas of data quantization,input channel parallel operation and pipeline post-processing,a quantized parallel convolution operation unit applying the lightweight model is designed.Then,the quantized parallel convolution operation array is designed,and the parallel operation of the output channel and the feature map are designed to greatly increase the parallelism of the convolution and increase the operation speed of the convolutional neural network.Besides,this thesis designs the overall structure of a lightweight configurable convolutional neural network accelerator,realizes the configurability of the convolutional neural network through model parameter analysis,and controls each function module of the accelerator with the state machine.This thesis also designed a multi-channel ping-pong memory access structure to read and write data of the accelerator.The accelerator memory access time was covered by the operation time,so as to avoid data waiting and realize the operation acceleration.Finally,this thesis used the Pytorch framework to quantify the SSD network in INT8 to verify the feasibility of the lightweight model.After that,the accelerator circuit was simulated by a simulation tool,and FPGA was used to test and evaluate the performance of the accelerator.At a working frequency of 100 MHz,the accelerator power consumption was 7.6W,the peak computing power could reach 1638.4GOPS,the energy efficiency was215.58GOPS/W,and the time required for the accelerator to run an SSD model was15.89 ms.Compared with other implementation platforms,the accelerator designed in this thesis could achieve an energy efficiency improvement of 3.5 to 479 times.
Keywords/Search Tags:Convolutional Neural Network Accelerator, Lightweight, Configurable, Convolution Parallel
PDF Full Text Request
Related items