Design Of Neural Network Accelerator In Multiple Convolutional Modes

Posted on:2022-01-24

Degree:Master

Type:Thesis

Country:China

Candidate:H L Zhang

Full Text:PDF

GTID:2518306560980009

Subject:Circuits and Systems

Abstract/Summary:

PDF Full Text Request

Artificial intelligence(AI)has had its ups and downs since its first appearance,which has undergone a qualitative leap in the 21 st century driven by the promotion of algorithm breakthrough,computing power improving,and massive data.Convolutional neural network(CNN),as one of the representative algorithms,has an excellent performance in medicine,unmanned driving,speech recognition,and other fields.CNN can obtain higher accuracy at the price of a larger number of parameters and greater computational complexity.On the one hand,Winograd minimal filtering algorithm can reduce the computational complexity by reducing the multiply operations in the convolutional layer,which accelerates the calculation of CNN.On the other hand,based on the comprehensive consideration of the performance,power consumption,privacy,and other factors in the practical application,the CNN hardware accelerator becomes increasingly essential.The dedicated processing elements(PEs)for Winograd-based convolutions are limited in their scope of application,so it is necessary to set extra PEs for conventional convolution and fully connected layers,which results in severe underutilization of resources.In view of the above,based on a field-programmable gate array(FPGA),a CNN accelerator with multiple convolution modes is proposed.The main work includes the followings:1.A cascaded-based configurable Winograd-MAC computing structure is designed.Based on the analysis of the similarities and differences between the Winograd-based convolutions and conventional convolution,the proposed computing structure makes full use of time-division multiplexing of resources.The configurable computing structures can realize two convolution algorithms and calculations for various sizes of filters.And the cascading adders are set for the computing structure.By multi-level cascading of these adders,the computing structures can complete the sum of all products in a filter and the sum of results in different input feature maps(ifmaps).Compared with dedicated processing elements,the proposed computing structure can reduce DSPs by 50% and LUTs by 34.09%.2.A pipeline-based input matrix transform circuit is proposed.In view of the storage pressure caused by the prior transform,a real-time transform circuit for ifmaps is proposed.Because the overlapping part of different filtering windows shares the results of the primary transformation,the proposed circuit makes full use of the the pre-conversion result to reduce repetitive operation.Meanwhile,two dedicated calculation circuits are set respectively for the primary transformation and the secondary transformation,which allows pipelined execution of two transform stages.As the size of the ifmaps increases,the speed-up effect will become more obvious.3.Two weight distribution methods are explored and compared,and a multi-mode CNN accelerator is implemented based on the unicast weight distribution method.The implemented accelerator can choose Winograd-based convolutions for 3�3 filters and conventional convolutions for other filters.Experiments show that the throughput performance can reach 379.98 GOPs,and the resource utilization efficiency is 0.495 GOPs/DSP.

Keywords/Search Tags:

Convolutional neural network, Winograd algorithm, Hardware Accelerator, Matrix transformations, Configurable

PDF Full Text Request

Related items

1	Design And Optimization Of Configurable Hardware Accelerator For LSTM Neural Network
2	Design And Optimization Of Convolution Array Accelerator Based On FPGA
3	The Design Of RISC-V Processor Suitable For Accelerated Convolutional Neural Network On The Edge Side Of The Internet Of Things
4	Design Of Lightweight And Configurable Convolutional Neural Network Accelerator
5	Research On Hardware Acceleration Of 3D Convolutional Neural Network Algorithm Based On DSP
6	Design Of Sparse Convolutional Neural Network Accelerator Based On Shift Unit
7	Implementation And Application Of Hardware Accelerator Based On Image Recognition Technology
8	Design Of Hardware Accelerator Based On FPGA For Convolutional Neural Networks
9	Design Of General-purpose Convolutional Neural Network Accelerator Based On FPGA
10	Design And Optimization Of Tiny YOLO Convolutional Neural Network Accelerator