Design And Implementation Of Lightweight Convolutional Neural Network Accelerator On SoPC

Posted on:2021-10-01

Degree:Master

Type:Thesis

Country:China

Candidate:Y Lei

Full Text:PDF

GTID:2518306476950459

Subject:Electronics and Communications Engineering

Abstract/Summary:

PDF Full Text Request

In recent years,with the development of artificial intelligence,convolutional neural network(CNN)is widely adopted in the computer vision field.CNN has shown its superiority in some tasks,such as image classification and object detection.However,its high computational complexity makes it hard to be directly deployed on edge/mobile devices whose performance and energy consumption are extremely limited.The lightweight network or the dedicated circuit are two solutions to make it possible to run CNN on such devices.Following these conceptions,a lightweight CNN accelerator on So PC(System on a Programmable Chip)is proposed in this thesis.The inference stage of Mobile Net V2 runs on this platform.First,the parallelism in Mobile Net V2 is analyzed for unrolling the loops and parallelizing the calculation.The different parallelism strategies are applied regarding the convolution types.The runtime of different parallel factor is roughly estimated by a quantitative model.Then the parallel factor is decided.Next,the accelerator system on So PC is implemented,which could be divided into hardware and software parts.The main function of hardware is to calculate the convolution layers.It consists of control logic,convolution module,DMA,parameter cache,feature map buffer,and CSR.In the convolution module,different parallelism is achieved by the unified parts with the configurable data path.The batch normalization is optimized to avoid division and square root operation.The DMA and parameter cache with pre-fetch function are introduced to reduce the latency of reading parameters from DRAM.Different kinds of on-chip buffer are used to store the high-throughput intermediate data.The software consists of the middleware on PC and the program in the PS side.The middleware extracts the parameters and converts them to fixed-point numbers,then packs them to a readable format for DMA.The program in PS is designed to control the PL,to calculate the FC layer and to classify.The FC layer is optimized to run on the NEON,rather than FPU,for parallelism.Moreover,the convolution layer and the FC layer are pipelined for multi-image classification so that the PL and PS could work simultaneously to bring a higher throughput.Finally,a test platform based on PC and ZC702 is built to test the accelerator system.The accelerator works on 150 MHz,with Mobile Net V2 0.5 96 model.The data is quantized to signed16-bit fixed-point number.169 DSPs are utilized in the system.The system could classify an image in 2.2ms,with the performance of 16.4GOPS.

Keywords/Search Tags:

Convolutional Neural Network, Deep Learning, MobileNetV2, Hardware Acceleration, FPGA

PDF Full Text Request

Related items

1	Research On Hardware Acceleration Method Of Deep Convolutional Neural Network Based On FPGA
2	Hardware Acceleration Design And Research Based On FPGA And Deep Learning Algorithm
3	Design Of General-purpose Deep Convolutional Neural Network Accelerator Based On FPGA
4	Research On Acceleration Of Deep Convolutional Neural Network Based On FPGA
5	Research On The Design Of Convolutional Neural Network Hardware Acceleration System In Deep Learning
6	Design And Implementation Of Convolutional Neural Network Acceleration Based On FPGA
7	Acceleration System Design And Implement For Convolutional Neural Network Based On SOC FPGA
8	Design Of Convolutional Neural Network Acceleration System And FPGA Verification
9	Research On CNN Network Acceleration For Image Classification Based On FPGA
10	Research And Implementation Of FPGA Acceleration For Deep Learning Algorithm